linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v8 0/2] TmFifo platform driver for Mellanox BlueField SoC
       [not found] <b143b40446c1870fb8d422b364ead95d54552be9.1527264077.git.lsun@mellanox.com>
@ 2019-01-28 17:28 ` Liming Sun
  2019-01-28 17:28 ` [PATCH v8 1/2] platform/mellanox: Add TmFifo driver for Mellanox BlueField Soc Liming Sun
                   ` (10 subsequent siblings)
  11 siblings, 0 replies; 30+ messages in thread
From: Liming Sun @ 2019-01-28 17:28 UTC (permalink / raw)
  To: Rob Herring, Mark Rutland, Arnd Bergmann, David Woods,
	Andy Shevchenko, Darren Hart, Vadim Pasternak
  Cc: Liming Sun, devicetree, linux-kernel, platform-driver-x86

This patch series implements the device side platform driver
support for the TmFifo on Mellanox BlueField SoC.

TmFifo is part of the RShim component. It provides FIFOs to
communicate with external host machine via USB or PCIe (SmartNic
case). External host machine has driver to access the RShim
component as well, which is not covered in this patch series.

This patch series was submitted to drivers/soc in previous versions.
This version (v8) re-submit it to drivers/platform according to
the received comments / suggestions.

Patch v8 1/2 has changes according to some comments from Vadim
Pasternak during Mellanox internal review.

Patch v8 2/2 was reviewed by Rob Herring before, but might need
a second look since the location of the driver code is moved.

Liming Sun (2):
  platform/mellanox: Add TmFifo driver for Mellanox BlueField Soc
  dt-bindings: soc: Add TmFifo binding for Mellanox BlueField SoC

 .../devicetree/bindings/soc/mellanox/tmfifo.txt    |   23 +
 drivers/platform/mellanox/Kconfig                  |   13 +-
 drivers/platform/mellanox/Makefile                 |    1 +
 drivers/platform/mellanox/mlxbf-tmfifo-regs.h      |   67 +
 drivers/platform/mellanox/mlxbf-tmfifo.c           | 1289 ++++++++++++++++++++
 5 files changed, 1392 insertions(+), 1 deletion(-)
 create mode 100644 Documentation/devicetree/bindings/soc/mellanox/tmfifo.txt
 create mode 100644 drivers/platform/mellanox/mlxbf-tmfifo-regs.h
 create mode 100644 drivers/platform/mellanox/mlxbf-tmfifo.c

-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH v8 1/2] platform/mellanox: Add TmFifo driver for Mellanox BlueField Soc
       [not found] <b143b40446c1870fb8d422b364ead95d54552be9.1527264077.git.lsun@mellanox.com>
  2019-01-28 17:28 ` [PATCH v8 0/2] TmFifo platform driver for Mellanox BlueField SoC Liming Sun
@ 2019-01-28 17:28 ` Liming Sun
  2019-01-29 22:06   ` Andy Shevchenko
  2019-01-30  6:24   ` Vadim Pasternak
  2019-01-28 17:28 ` [PATCH v8 2/2] dt-bindings: soc: Add TmFifo binding for Mellanox BlueField SoC Liming Sun
                   ` (9 subsequent siblings)
  11 siblings, 2 replies; 30+ messages in thread
From: Liming Sun @ 2019-01-28 17:28 UTC (permalink / raw)
  To: Rob Herring, Mark Rutland, Arnd Bergmann, David Woods,
	Andy Shevchenko, Darren Hart, Vadim Pasternak
  Cc: Liming Sun, devicetree, linux-kernel, platform-driver-x86

This commit adds the TmFifo platform driver for Mellanox BlueField
Soc. TmFifo is a shared FIFO which enables external host machine
to exchange data with the SoC via USB or PCIe. The driver is based
on virtio framework and has console and network access enabled.

Reviewed-by: David Woods <dwoods@mellanox.com>
Signed-off-by: Liming Sun <lsun@mellanox.com>
---
 drivers/platform/mellanox/Kconfig             |   13 +-
 drivers/platform/mellanox/Makefile            |    1 +
 drivers/platform/mellanox/mlxbf-tmfifo-regs.h |   67 ++
 drivers/platform/mellanox/mlxbf-tmfifo.c      | 1289 +++++++++++++++++++++++++
 4 files changed, 1369 insertions(+), 1 deletion(-)
 create mode 100644 drivers/platform/mellanox/mlxbf-tmfifo-regs.h
 create mode 100644 drivers/platform/mellanox/mlxbf-tmfifo.c

diff --git a/drivers/platform/mellanox/Kconfig b/drivers/platform/mellanox/Kconfig
index cd8a908..a565070 100644
--- a/drivers/platform/mellanox/Kconfig
+++ b/drivers/platform/mellanox/Kconfig
@@ -5,7 +5,7 @@
 
 menuconfig MELLANOX_PLATFORM
 	bool "Platform support for Mellanox hardware"
-	depends on X86 || ARM || COMPILE_TEST
+	depends on X86 || ARM || ARM64 || COMPILE_TEST
 	---help---
 	  Say Y here to get to see options for platform support for
 	  Mellanox systems. This option alone does not add any kernel code.
@@ -34,4 +34,15 @@ config MLXREG_IO
 	  to system resets operation, system reset causes monitoring and some
 	  kinds of mux selection.
 
+config MLXBF_TMFIFO
+	tristate "Mellanox BlueField SoC TmFifo platform driver"
+	depends on ARM64
+	default m
+	select VIRTIO_CONSOLE
+	select VIRTIO_NET
+	help
+	  Say y here to enable TmFifo support. The TmFifo driver provides
+          platform driver support for the TmFifo which supports console
+          and networking based on the virtio framework.
+
 endif # MELLANOX_PLATFORM
diff --git a/drivers/platform/mellanox/Makefile b/drivers/platform/mellanox/Makefile
index 57074d9c..f0c061d 100644
--- a/drivers/platform/mellanox/Makefile
+++ b/drivers/platform/mellanox/Makefile
@@ -5,3 +5,4 @@
 #
 obj-$(CONFIG_MLXREG_HOTPLUG)	+= mlxreg-hotplug.o
 obj-$(CONFIG_MLXREG_IO) += mlxreg-io.o
+obj-$(CONFIG_MLXBF_TMFIFO)	+= mlxbf-tmfifo.o
diff --git a/drivers/platform/mellanox/mlxbf-tmfifo-regs.h b/drivers/platform/mellanox/mlxbf-tmfifo-regs.h
new file mode 100644
index 0000000..90c9c2cf
--- /dev/null
+++ b/drivers/platform/mellanox/mlxbf-tmfifo-regs.h
@@ -0,0 +1,67 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Copyright (c) 2019, Mellanox Technologies. All rights reserved.
+ */
+
+#ifndef __MLXBF_TMFIFO_REGS_H__
+#define __MLXBF_TMFIFO_REGS_H__
+
+#include <linux/types.h>
+
+#define MLXBF_TMFIFO_TX_DATA 0x0
+
+#define MLXBF_TMFIFO_TX_STS 0x8
+#define MLXBF_TMFIFO_TX_STS__LENGTH 0x0001
+#define MLXBF_TMFIFO_TX_STS__COUNT_SHIFT 0
+#define MLXBF_TMFIFO_TX_STS__COUNT_WIDTH 9
+#define MLXBF_TMFIFO_TX_STS__COUNT_RESET_VAL 0
+#define MLXBF_TMFIFO_TX_STS__COUNT_RMASK 0x1ff
+#define MLXBF_TMFIFO_TX_STS__COUNT_MASK  0x1ff
+
+#define MLXBF_TMFIFO_TX_CTL 0x10
+#define MLXBF_TMFIFO_TX_CTL__LENGTH 0x0001
+#define MLXBF_TMFIFO_TX_CTL__LWM_SHIFT 0
+#define MLXBF_TMFIFO_TX_CTL__LWM_WIDTH 8
+#define MLXBF_TMFIFO_TX_CTL__LWM_RESET_VAL 128
+#define MLXBF_TMFIFO_TX_CTL__LWM_RMASK 0xff
+#define MLXBF_TMFIFO_TX_CTL__LWM_MASK  0xff
+#define MLXBF_TMFIFO_TX_CTL__HWM_SHIFT 8
+#define MLXBF_TMFIFO_TX_CTL__HWM_WIDTH 8
+#define MLXBF_TMFIFO_TX_CTL__HWM_RESET_VAL 128
+#define MLXBF_TMFIFO_TX_CTL__HWM_RMASK 0xff
+#define MLXBF_TMFIFO_TX_CTL__HWM_MASK  0xff00
+#define MLXBF_TMFIFO_TX_CTL__MAX_ENTRIES_SHIFT 32
+#define MLXBF_TMFIFO_TX_CTL__MAX_ENTRIES_WIDTH 9
+#define MLXBF_TMFIFO_TX_CTL__MAX_ENTRIES_RESET_VAL 256
+#define MLXBF_TMFIFO_TX_CTL__MAX_ENTRIES_RMASK 0x1ff
+#define MLXBF_TMFIFO_TX_CTL__MAX_ENTRIES_MASK  0x1ff00000000ULL
+
+#define MLXBF_TMFIFO_RX_DATA 0x0
+
+#define MLXBF_TMFIFO_RX_STS 0x8
+#define MLXBF_TMFIFO_RX_STS__LENGTH 0x0001
+#define MLXBF_TMFIFO_RX_STS__COUNT_SHIFT 0
+#define MLXBF_TMFIFO_RX_STS__COUNT_WIDTH 9
+#define MLXBF_TMFIFO_RX_STS__COUNT_RESET_VAL 0
+#define MLXBF_TMFIFO_RX_STS__COUNT_RMASK 0x1ff
+#define MLXBF_TMFIFO_RX_STS__COUNT_MASK  0x1ff
+
+#define MLXBF_TMFIFO_RX_CTL 0x10
+#define MLXBF_TMFIFO_RX_CTL__LENGTH 0x0001
+#define MLXBF_TMFIFO_RX_CTL__LWM_SHIFT 0
+#define MLXBF_TMFIFO_RX_CTL__LWM_WIDTH 8
+#define MLXBF_TMFIFO_RX_CTL__LWM_RESET_VAL 128
+#define MLXBF_TMFIFO_RX_CTL__LWM_RMASK 0xff
+#define MLXBF_TMFIFO_RX_CTL__LWM_MASK  0xff
+#define MLXBF_TMFIFO_RX_CTL__HWM_SHIFT 8
+#define MLXBF_TMFIFO_RX_CTL__HWM_WIDTH 8
+#define MLXBF_TMFIFO_RX_CTL__HWM_RESET_VAL 128
+#define MLXBF_TMFIFO_RX_CTL__HWM_RMASK 0xff
+#define MLXBF_TMFIFO_RX_CTL__HWM_MASK  0xff00
+#define MLXBF_TMFIFO_RX_CTL__MAX_ENTRIES_SHIFT 32
+#define MLXBF_TMFIFO_RX_CTL__MAX_ENTRIES_WIDTH 9
+#define MLXBF_TMFIFO_RX_CTL__MAX_ENTRIES_RESET_VAL 256
+#define MLXBF_TMFIFO_RX_CTL__MAX_ENTRIES_RMASK 0x1ff
+#define MLXBF_TMFIFO_RX_CTL__MAX_ENTRIES_MASK  0x1ff00000000ULL
+
+#endif /* !defined(__MLXBF_TMFIFO_REGS_H__) */
diff --git a/drivers/platform/mellanox/mlxbf-tmfifo.c b/drivers/platform/mellanox/mlxbf-tmfifo.c
new file mode 100644
index 0000000..c1afe47
--- /dev/null
+++ b/drivers/platform/mellanox/mlxbf-tmfifo.c
@@ -0,0 +1,1289 @@
+// SPDX-License-Identifier: GPL-2.0+
+/*
+ * Mellanox BlueField SoC TmFifo driver
+ *
+ * Copyright (C) 2019 Mellanox Technologies
+ */
+
+#include <linux/acpi.h>
+#include <linux/bitfield.h>
+#include <linux/cache.h>
+#include <linux/device.h>
+#include <linux/dma-mapping.h>
+#include <linux/efi.h>
+#include <linux/io.h>
+#include <linux/interrupt.h>
+#include <linux/irq.h>
+#include <linux/kernel.h>
+#include <linux/math64.h>
+#include <linux/module.h>
+#include <linux/moduleparam.h>
+#include <linux/mutex.h>
+#include <linux/platform_device.h>
+#include <linux/resource.h>
+#include <linux/slab.h>
+#include <linux/types.h>
+#include <linux/version.h>
+#include <linux/virtio.h>
+#include <linux/virtio_config.h>
+#include <linux/virtio_console.h>
+#include <linux/virtio_ids.h>
+#include <linux/virtio_net.h>
+#include <linux/virtio_ring.h>
+#include <asm/byteorder.h>
+
+#include "mlxbf-tmfifo-regs.h"
+
+/* Vring size. */
+#define MLXBF_TMFIFO_VRING_SIZE			1024
+
+/* Console Tx buffer size. */
+#define MLXBF_TMFIFO_CONS_TX_BUF_SIZE		(32 * 1024)
+
+/* House-keeping timer interval. */
+static int mlxbf_tmfifo_timer_interval = HZ / 10;
+
+/* Global lock. */
+static DEFINE_MUTEX(mlxbf_tmfifo_lock);
+
+/* Virtual devices sharing the TM FIFO. */
+#define MLXBF_TMFIFO_VDEV_MAX		(VIRTIO_ID_CONSOLE + 1)
+
+/* Struct declaration. */
+struct mlxbf_tmfifo;
+
+/* Structure to maintain the ring state. */
+struct mlxbf_tmfifo_vring {
+	void *va;			/* virtual address */
+	dma_addr_t dma;			/* dma address */
+	struct virtqueue *vq;		/* virtqueue pointer */
+	struct vring_desc *desc;	/* current desc */
+	struct vring_desc *desc_head;	/* current desc head */
+	int cur_len;			/* processed len in current desc */
+	int rem_len;			/* remaining length to be processed */
+	int size;			/* vring size */
+	int align;			/* vring alignment */
+	int id;				/* vring id */
+	int vdev_id;			/* TMFIFO_VDEV_xxx */
+	u32 pkt_len;			/* packet total length */
+	__virtio16 next_avail;		/* next avail desc id */
+	struct mlxbf_tmfifo *fifo;	/* pointer back to the tmfifo */
+};
+
+/* Interrupt types. */
+enum {
+	MLXBF_TM_RX_LWM_IRQ,		/* Rx low water mark irq */
+	MLXBF_TM_RX_HWM_IRQ,		/* Rx high water mark irq */
+	MLXBF_TM_TX_LWM_IRQ,		/* Tx low water mark irq */
+	MLXBF_TM_TX_HWM_IRQ,		/* Tx high water mark irq */
+	MLXBF_TM_IRQ_CNT
+};
+
+/* Ring types (Rx & Tx). */
+enum {
+	MLXBF_TMFIFO_VRING_RX,		/* Rx ring */
+	MLXBF_TMFIFO_VRING_TX,		/* Tx ring */
+	MLXBF_TMFIFO_VRING_NUM
+};
+
+struct mlxbf_tmfifo_vdev {
+	struct virtio_device vdev;	/* virtual device */
+	u8 status;
+	u64 features;
+	union {				/* virtio config space */
+		struct virtio_console_config cons;
+		struct virtio_net_config net;
+	} config;
+	struct mlxbf_tmfifo_vring vrings[MLXBF_TMFIFO_VRING_NUM];
+	u8 *tx_buf;			/* tx buffer */
+	u32 tx_head;			/* tx buffer head */
+	u32 tx_tail;			/* tx buffer tail */
+};
+
+struct mlxbf_tmfifo_irq_info {
+	struct mlxbf_tmfifo *fifo;	/* tmfifo structure */
+	int irq;			/* interrupt number */
+	int index;			/* array index */
+};
+
+/* TMFIFO device structure */
+struct mlxbf_tmfifo {
+	struct mlxbf_tmfifo_vdev *vdev[MLXBF_TMFIFO_VDEV_MAX]; /* devices */
+	struct platform_device *pdev;	/* platform device */
+	struct mutex lock;		/* fifo lock */
+	void __iomem *rx_base;		/* mapped register base */
+	void __iomem *tx_base;		/* mapped register base */
+	int tx_fifo_size;		/* number of entries of the Tx FIFO */
+	int rx_fifo_size;		/* number of entries of the Rx FIFO */
+	unsigned long pend_events;	/* pending bits for deferred process */
+	struct mlxbf_tmfifo_irq_info irq_info[MLXBF_TM_IRQ_CNT]; /* irq info */
+	struct work_struct work;	/* work struct for deferred process */
+	struct timer_list timer;	/* keepalive timer */
+	struct mlxbf_tmfifo_vring *vring[2];	/* current Tx/Rx ring */
+	bool is_ready;			/* ready flag */
+	spinlock_t spin_lock;		/* spin lock */
+};
+
+union mlxbf_tmfifo_msg_hdr {
+	struct {
+		u8 type;		/* message type */
+		__be16 len;		/* payload length */
+		u8 unused[5];		/* reserved, set to 0 */
+	} __packed;
+	u64 data;
+};
+
+/*
+ * Default MAC.
+ * This MAC address will be read from EFI persistent variable if configured.
+ * It can also be reconfigured with standard Linux tools.
+ */
+static u8 mlxbf_tmfifo_net_default_mac[6] = {
+	0x00, 0x1A, 0xCA, 0xFF, 0xFF, 0x01};
+
+/* MTU setting of the virtio-net interface. */
+#define MLXBF_TMFIFO_NET_MTU		1500
+
+/* Maximum L2 header length. */
+#define MLXBF_TMFIFO_NET_L2_OVERHEAD	36
+
+/* Supported virtio-net features. */
+#define MLXBF_TMFIFO_NET_FEATURES	((1UL << VIRTIO_NET_F_MTU) | \
+					 (1UL << VIRTIO_NET_F_STATUS) | \
+					 (1UL << VIRTIO_NET_F_MAC))
+
+/* Return the consumed Tx buffer space. */
+static int mlxbf_tmfifo_vdev_tx_buf_len(struct mlxbf_tmfifo_vdev *vdev)
+{
+	return ((vdev->tx_tail >= vdev->tx_head) ?
+	       (vdev->tx_tail - vdev->tx_head) :
+	       (MLXBF_TMFIFO_CONS_TX_BUF_SIZE - vdev->tx_head + vdev->tx_tail));
+}
+
+/* Return the available Tx buffer space. */
+static int mlxbf_tmfifo_vdev_tx_buf_avail(struct mlxbf_tmfifo_vdev *vdev)
+{
+	return (MLXBF_TMFIFO_CONS_TX_BUF_SIZE - 8 -
+		mlxbf_tmfifo_vdev_tx_buf_len(vdev));
+}
+
+/* Update Tx buffer pointer after pushing data. */
+static void mlxbf_tmfifo_vdev_tx_buf_push(struct mlxbf_tmfifo_vdev *vdev,
+					  u32 len)
+{
+	vdev->tx_tail += len;
+	if (vdev->tx_tail >= MLXBF_TMFIFO_CONS_TX_BUF_SIZE)
+		vdev->tx_tail -= MLXBF_TMFIFO_CONS_TX_BUF_SIZE;
+}
+
+/* Update Tx buffer pointer after popping data. */
+static void mlxbf_tmfifo_vdev_tx_buf_pop(struct mlxbf_tmfifo_vdev *vdev,
+					 u32 len)
+{
+	vdev->tx_head += len;
+	if (vdev->tx_head >= MLXBF_TMFIFO_CONS_TX_BUF_SIZE)
+		vdev->tx_head -= MLXBF_TMFIFO_CONS_TX_BUF_SIZE;
+}
+
+/* Allocate vrings for the fifo. */
+static int mlxbf_tmfifo_alloc_vrings(struct mlxbf_tmfifo *fifo,
+				     struct mlxbf_tmfifo_vdev *tm_vdev,
+				     int vdev_id)
+{
+	struct mlxbf_tmfifo_vring *vring;
+	dma_addr_t dma;
+	int i, size;
+	void *va;
+
+	for (i = 0; i < ARRAY_SIZE(tm_vdev->vrings); i++) {
+		vring = &tm_vdev->vrings[i];
+		vring->fifo = fifo;
+		vring->size = MLXBF_TMFIFO_VRING_SIZE;
+		vring->align = SMP_CACHE_BYTES;
+		vring->id = i;
+		vring->vdev_id = vdev_id;
+
+		size = PAGE_ALIGN(vring_size(vring->size, vring->align));
+		va = dma_alloc_coherent(tm_vdev->vdev.dev.parent, size, &dma,
+					GFP_KERNEL);
+		if (!va) {
+			dev_err(tm_vdev->vdev.dev.parent,
+				"vring allocation failed\n");
+			return -EINVAL;
+		}
+
+		vring->va = va;
+		vring->dma = dma;
+	}
+
+	return 0;
+}
+
+/* Free vrings of the fifo device. */
+static void mlxbf_tmfifo_free_vrings(struct mlxbf_tmfifo *fifo, int vdev_id)
+{
+	struct mlxbf_tmfifo_vdev *tm_vdev = fifo->vdev[vdev_id];
+	struct mlxbf_tmfifo_vring *vring;
+	int i, size;
+
+	for (i = 0; i < ARRAY_SIZE(tm_vdev->vrings); i++) {
+		vring = &tm_vdev->vrings[i];
+
+		size = PAGE_ALIGN(vring_size(vring->size, vring->align));
+		if (vring->va) {
+			dma_free_coherent(tm_vdev->vdev.dev.parent, size,
+					  vring->va, vring->dma);
+			vring->va = NULL;
+			if (vring->vq) {
+				vring_del_virtqueue(vring->vq);
+				vring->vq = NULL;
+			}
+		}
+	}
+}
+
+/* Free interrupts of the fifo device. */
+static void mlxbf_tmfifo_free_irqs(struct mlxbf_tmfifo *fifo)
+{
+	int i, irq;
+
+	for (i = 0; i < MLXBF_TM_IRQ_CNT; i++) {
+		irq = fifo->irq_info[i].irq;
+		if (irq) {
+			fifo->irq_info[i].irq = 0;
+			disable_irq(irq);
+			free_irq(irq, (u8 *)fifo + i);
+		}
+	}
+}
+
+/* Interrupt handler. */
+static irqreturn_t mlxbf_tmfifo_irq_handler(int irq, void *arg)
+{
+	struct mlxbf_tmfifo_irq_info *irq_info;
+
+	irq_info = (struct mlxbf_tmfifo_irq_info *)arg;
+
+	if (irq_info->index < MLXBF_TM_IRQ_CNT &&
+	    !test_and_set_bit(irq_info->index, &irq_info->fifo->pend_events))
+		schedule_work(&irq_info->fifo->work);
+
+	return IRQ_HANDLED;
+}
+
+/* Nothing to do for now. */
+static void mlxbf_tmfifo_virtio_dev_release(struct device *dev)
+{
+}
+
+/* Get the next packet descriptor from the vring. */
+static inline struct vring_desc *
+mlxbf_tmfifo_virtio_get_next_desc(struct virtqueue *vq)
+{
+	struct mlxbf_tmfifo_vring *vring;
+	unsigned int idx, head;
+	struct vring *vr;
+
+	vring = (struct mlxbf_tmfifo_vring *)vq->priv;
+	vr = (struct vring *)virtqueue_get_vring(vq);
+
+	if (!vr || vring->next_avail == vr->avail->idx)
+		return NULL;
+
+	idx = vring->next_avail % vr->num;
+	head = vr->avail->ring[idx];
+	BUG_ON(head >= vr->num);
+	vring->next_avail++;
+	return &vr->desc[head];
+}
+
+static inline void mlxbf_tmfifo_virtio_release_desc(
+	struct virtio_device *vdev, struct vring *vr,
+	struct vring_desc *desc, u32 len)
+{
+	unsigned int idx;
+
+	idx = vr->used->idx % vr->num;
+	vr->used->ring[idx].id = desc - vr->desc;
+	vr->used->ring[idx].len = cpu_to_virtio32(vdev, len);
+
+	/* Virtio could poll and check the 'idx' to decide
+	 * whether the desc is done or not. Add a memory
+	 * barrier here to make sure the update above completes
+	 * before updating the idx.
+	 */
+	mb();
+	vr->used->idx++;
+}
+
+/* Get the total length of a descriptor chain. */
+static inline u32 mlxbf_tmfifo_virtio_get_pkt_len(struct virtio_device *vdev,
+						  struct vring_desc *desc,
+						  struct vring *vr)
+{
+	u32 len = 0, idx;
+
+	while (desc) {
+		len += virtio32_to_cpu(vdev, desc->len);
+		if (!(virtio16_to_cpu(vdev, desc->flags) & VRING_DESC_F_NEXT))
+			break;
+		idx = virtio16_to_cpu(vdev, desc->next);
+		desc = &vr->desc[idx];
+	}
+
+	return len;
+}
+
+static void mlxbf_tmfifo_release_pkt(struct virtio_device *vdev,
+				     struct mlxbf_tmfifo_vring *vring,
+				     struct vring_desc **desc)
+{
+	struct vring *vr = (struct vring *)virtqueue_get_vring(vring->vq);
+	struct vring_desc *desc_head;
+	uint32_t pkt_len = 0;
+
+	if (!vr)
+		return;
+
+	if (desc != NULL && *desc != NULL && vring->desc_head != NULL) {
+		desc_head = vring->desc_head;
+		pkt_len = vring->pkt_len;
+	} else {
+		desc_head = mlxbf_tmfifo_virtio_get_next_desc(vring->vq);
+		if (desc_head != NULL) {
+			pkt_len = mlxbf_tmfifo_virtio_get_pkt_len(
+				vdev, desc_head, vr);
+		}
+	}
+
+	if (desc_head != NULL)
+		mlxbf_tmfifo_virtio_release_desc(vdev, vr, desc_head, pkt_len);
+
+	if (desc != NULL)
+		*desc = NULL;
+	vring->pkt_len = 0;
+}
+
+/* House-keeping timer. */
+static void mlxbf_tmfifo_timer(struct timer_list *arg)
+{
+	struct mlxbf_tmfifo *fifo;
+
+	fifo = container_of(arg, struct mlxbf_tmfifo, timer);
+
+	/*
+	 * Wake up the work handler to poll the Rx FIFO in case interrupt
+	 * missing or any leftover bytes stuck in the FIFO.
+	 */
+	test_and_set_bit(MLXBF_TM_RX_HWM_IRQ, &fifo->pend_events);
+
+	/*
+	 * Wake up Tx handler in case virtio has queued too many packets
+	 * and are waiting for buffer return.
+	 */
+	test_and_set_bit(MLXBF_TM_TX_LWM_IRQ, &fifo->pend_events);
+
+	schedule_work(&fifo->work);
+
+	mod_timer(&fifo->timer, jiffies + mlxbf_tmfifo_timer_interval);
+}
+
+/* Buffer the console output. */
+static void mlxbf_tmfifo_console_output(struct mlxbf_tmfifo_vdev *cons,
+					struct virtqueue *vq)
+{
+	struct vring *vr = (struct vring *)virtqueue_get_vring(vq);
+	struct vring_desc *head_desc, *desc = NULL;
+	struct virtio_device *vdev = &cons->vdev;
+	u32 len, pkt_len, idx;
+	void *addr;
+
+	for (;;) {
+		head_desc = mlxbf_tmfifo_virtio_get_next_desc(vq);
+		if (head_desc == NULL)
+			break;
+
+		/* Release the packet if no more space. */
+		pkt_len = mlxbf_tmfifo_virtio_get_pkt_len(vdev, head_desc, vr);
+		if (pkt_len > mlxbf_tmfifo_vdev_tx_buf_avail(cons)) {
+			mlxbf_tmfifo_virtio_release_desc(vdev, vr, head_desc,
+							 pkt_len);
+			break;
+		}
+
+		desc = head_desc;
+
+		while (desc != NULL) {
+			addr = phys_to_virt(virtio64_to_cpu(vdev, desc->addr));
+			len = virtio32_to_cpu(vdev, desc->len);
+
+			if (len <= MLXBF_TMFIFO_CONS_TX_BUF_SIZE -
+			    cons->tx_tail) {
+				memcpy(cons->tx_buf + cons->tx_tail, addr, len);
+			} else {
+				u32 seg;
+
+				seg = MLXBF_TMFIFO_CONS_TX_BUF_SIZE -
+					cons->tx_tail;
+				memcpy(cons->tx_buf + cons->tx_tail, addr, seg);
+				addr += seg;
+				memcpy(cons->tx_buf, addr, len - seg);
+			}
+			mlxbf_tmfifo_vdev_tx_buf_push(cons, len);
+
+			if (!(virtio16_to_cpu(vdev, desc->flags) &
+			    VRING_DESC_F_NEXT))
+				break;
+			idx = virtio16_to_cpu(vdev, desc->next);
+			desc = &vr->desc[idx];
+		}
+
+		mlxbf_tmfifo_virtio_release_desc(vdev, vr, head_desc, pkt_len);
+	}
+}
+
+/* Rx & Tx processing of a virtual queue. */
+static void mlxbf_tmfifo_virtio_rxtx(struct virtqueue *vq, bool is_rx)
+{
+	int num_avail = 0, hdr_len, tx_reserve;
+	struct mlxbf_tmfifo_vring *vring;
+	struct mlxbf_tmfifo_vdev *cons;
+	struct virtio_device *vdev;
+	struct mlxbf_tmfifo *fifo;
+	struct vring_desc *desc;
+	unsigned long flags;
+	struct vring *vr;
+	u64 sts, data;
+	u32 len, idx;
+	void *addr;
+
+	if (!vq)
+		return;
+
+	vring = (struct mlxbf_tmfifo_vring *)vq->priv;
+	fifo = vring->fifo;
+	vr = (struct vring *)virtqueue_get_vring(vq);
+
+	if (!fifo->vdev[vring->vdev_id])
+		return;
+	vdev = &fifo->vdev[vring->vdev_id]->vdev;
+	cons = fifo->vdev[VIRTIO_ID_CONSOLE];
+
+	/* Don't continue if another vring is running. */
+	if (fifo->vring[is_rx] != NULL && fifo->vring[is_rx] != vring)
+		return;
+
+	/* tx_reserve is used to reserved some room in FIFO for console. */
+	if (vring->vdev_id == VIRTIO_ID_NET) {
+		hdr_len = sizeof(struct virtio_net_hdr);
+		tx_reserve = fifo->tx_fifo_size / 16;
+	} else {
+		BUG_ON(vring->vdev_id != VIRTIO_ID_CONSOLE);
+		hdr_len = 0;
+		tx_reserve = 1;
+	}
+
+	desc = vring->desc;
+
+	while (1) {
+		/* Get available FIFO space. */
+		if (num_avail == 0) {
+			if (is_rx) {
+				/* Get the number of available words in FIFO. */
+				sts = readq(fifo->rx_base +
+					    MLXBF_TMFIFO_RX_STS);
+				num_avail = FIELD_GET(
+					MLXBF_TMFIFO_RX_STS__COUNT_MASK, sts);
+
+				/* Don't continue if nothing in FIFO. */
+				if (num_avail <= 0)
+					break;
+			} else {
+				/* Get available space in FIFO. */
+				sts = readq(fifo->tx_base +
+					    MLXBF_TMFIFO_TX_STS);
+				num_avail = fifo->tx_fifo_size - tx_reserve -
+					FIELD_GET(
+						MLXBF_TMFIFO_TX_STS__COUNT_MASK,
+						sts);
+
+				if (num_avail <= 0)
+					break;
+			}
+		}
+
+		/* Console output always comes from the Tx buffer. */
+		if (!is_rx && vring->vdev_id == VIRTIO_ID_CONSOLE &&
+		    cons != NULL && cons->tx_buf != NULL) {
+			union mlxbf_tmfifo_msg_hdr hdr;
+			int size;
+
+			size = mlxbf_tmfifo_vdev_tx_buf_len(cons);
+			if (num_avail < 2 || size == 0)
+				return;
+			if (size + sizeof(hdr) > num_avail * sizeof(u64))
+				size = num_avail * sizeof(u64) - sizeof(hdr);
+			/* Write header. */
+			hdr.data = 0;
+			hdr.type = VIRTIO_ID_CONSOLE;
+			hdr.len = htons(size);
+			writeq(cpu_to_le64(hdr.data),
+			       fifo->tx_base + MLXBF_TMFIFO_TX_DATA);
+
+			spin_lock_irqsave(&fifo->spin_lock, flags);
+			while (size > 0) {
+				addr = cons->tx_buf + cons->tx_head;
+
+				if (cons->tx_head + sizeof(u64) <=
+				    MLXBF_TMFIFO_CONS_TX_BUF_SIZE) {
+					memcpy(&data, addr, sizeof(u64));
+				} else {
+					int partial;
+
+					partial =
+						MLXBF_TMFIFO_CONS_TX_BUF_SIZE -
+						cons->tx_head;
+
+					memcpy(&data, addr, partial);
+					memcpy((u8 *)&data + partial,
+					       cons->tx_buf,
+					       sizeof(u64) - partial);
+				}
+				writeq(data,
+				       fifo->tx_base + MLXBF_TMFIFO_TX_DATA);
+
+				if (size >= sizeof(u64)) {
+					mlxbf_tmfifo_vdev_tx_buf_pop(
+						cons, sizeof(u64));
+					size -= sizeof(u64);
+				} else {
+					mlxbf_tmfifo_vdev_tx_buf_pop(
+						cons, size);
+					size = 0;
+				}
+			}
+			spin_unlock_irqrestore(&fifo->spin_lock, flags);
+			return;
+		}
+
+		/* Get the desc of next packet. */
+		if (!desc) {
+			/* Save the head desc of the chain. */
+			vring->desc_head =
+				mlxbf_tmfifo_virtio_get_next_desc(vq);
+			if (!vring->desc_head) {
+				vring->desc = NULL;
+				return;
+			}
+			desc = vring->desc_head;
+			vring->desc = desc;
+
+			if (is_rx && vring->vdev_id == VIRTIO_ID_NET) {
+				struct virtio_net_hdr *net_hdr;
+
+				/* Initialize the packet header. */
+				net_hdr = (struct virtio_net_hdr *)
+					phys_to_virt(virtio64_to_cpu(
+						vdev, desc->addr));
+				memset(net_hdr, 0, sizeof(*net_hdr));
+			}
+		}
+
+		/* Beginning of each packet. */
+		if (vring->pkt_len == 0) {
+			int vdev_id, vring_change = 0;
+			union mlxbf_tmfifo_msg_hdr hdr;
+
+			num_avail--;
+
+			/* Read/Write packet length. */
+			if (is_rx) {
+				hdr.data = readq(fifo->rx_base +
+						 MLXBF_TMFIFO_RX_DATA);
+				hdr.data = le64_to_cpu(hdr.data);
+
+				/* Skip the length 0 packet (keepalive). */
+				if (hdr.len == 0)
+					continue;
+
+				/* Check packet type. */
+				if (hdr.type == VIRTIO_ID_NET) {
+					struct virtio_net_config *config;
+
+					vdev_id = VIRTIO_ID_NET;
+					hdr_len = sizeof(struct virtio_net_hdr);
+					config =
+					    &fifo->vdev[vdev_id]->config.net;
+					if (ntohs(hdr.len) > config->mtu +
+						MLXBF_TMFIFO_NET_L2_OVERHEAD)
+						continue;
+				} else if (hdr.type == VIRTIO_ID_CONSOLE) {
+					vdev_id = VIRTIO_ID_CONSOLE;
+					hdr_len = 0;
+				} else {
+					continue;
+				}
+
+				/*
+				 * Check whether the new packet still belongs
+				 * to this vring or not. If not, update the
+				 * pkt_len of the new vring and return.
+				 */
+				if (vdev_id != vring->vdev_id) {
+					struct mlxbf_tmfifo_vdev *dev2 =
+						fifo->vdev[vdev_id];
+
+					if (!dev2)
+						break;
+					vring->desc = desc;
+					vring =
+					  &dev2->vrings[MLXBF_TMFIFO_VRING_RX];
+					vring_change = 1;
+				}
+				vring->pkt_len = ntohs(hdr.len) + hdr_len;
+			} else {
+				vring->pkt_len =
+					mlxbf_tmfifo_virtio_get_pkt_len(
+						vdev, desc, vr);
+
+				hdr.data = 0;
+				hdr.type = (vring->vdev_id == VIRTIO_ID_NET) ?
+					VIRTIO_ID_NET :
+					VIRTIO_ID_CONSOLE;
+				hdr.len = htons(vring->pkt_len - hdr_len);
+				writeq(cpu_to_le64(hdr.data),
+				       fifo->tx_base + MLXBF_TMFIFO_TX_DATA);
+			}
+
+			vring->cur_len = hdr_len;
+			vring->rem_len = vring->pkt_len;
+			fifo->vring[is_rx] = vring;
+
+			if (vring_change)
+				return;
+			continue;
+		}
+
+		/* Check available space in this desc. */
+		len = virtio32_to_cpu(vdev, desc->len);
+		if (len > vring->rem_len)
+			len = vring->rem_len;
+
+		/* Check if the current desc is already done. */
+		if (vring->cur_len == len)
+			goto check_done;
+
+		addr = phys_to_virt(virtio64_to_cpu(vdev, desc->addr));
+
+		/* Read a word from FIFO for Rx. */
+		if (is_rx) {
+			data = readq(fifo->rx_base + MLXBF_TMFIFO_RX_DATA);
+			data = le64_to_cpu(data);
+		}
+
+		if (vring->cur_len + sizeof(u64) <= len) {
+			/* The whole word. */
+			if (is_rx) {
+				memcpy(addr + vring->cur_len, &data,
+				       sizeof(u64));
+			} else {
+				memcpy(&data, addr + vring->cur_len,
+				       sizeof(u64));
+			}
+			vring->cur_len += sizeof(u64);
+		} else {
+			/* Leftover bytes. */
+			BUG_ON(vring->cur_len > len);
+			if (is_rx) {
+				memcpy(addr + vring->cur_len, &data,
+				       len - vring->cur_len);
+			} else {
+				memcpy(&data, addr + vring->cur_len,
+				       len - vring->cur_len);
+			}
+			vring->cur_len = len;
+		}
+
+		/* Write the word into FIFO for Tx. */
+		if (!is_rx) {
+			writeq(cpu_to_le64(data),
+			       fifo->tx_base + MLXBF_TMFIFO_TX_DATA);
+		}
+
+		num_avail--;
+
+check_done:
+		/* Check whether this desc is full or completed. */
+		if (vring->cur_len == len) {
+			vring->cur_len = 0;
+			vring->rem_len -= len;
+
+			/* Get the next desc on the chain. */
+			if (vring->rem_len > 0 &&
+			    (virtio16_to_cpu(vdev, desc->flags) &
+						VRING_DESC_F_NEXT)) {
+				idx = virtio16_to_cpu(vdev, desc->next);
+				desc = &vr->desc[idx];
+				continue;
+			}
+
+			/* Done and release the desc. */
+			mlxbf_tmfifo_release_pkt(vdev, vring, &desc);
+			fifo->vring[is_rx] = NULL;
+
+			/* Notify upper layer that packet is done. */
+			spin_lock_irqsave(&fifo->spin_lock, flags);
+			vring_interrupt(0, vq);
+			spin_unlock_irqrestore(&fifo->spin_lock, flags);
+			continue;
+		}
+	}
+
+	/* Save the current desc. */
+	vring->desc = desc;
+}
+
+/* The notify function is called when new buffers are posted. */
+static bool mlxbf_tmfifo_virtio_notify(struct virtqueue *vq)
+{
+	struct mlxbf_tmfifo_vring *vring;
+	struct mlxbf_tmfifo *fifo;
+	unsigned long flags;
+
+	vring = (struct mlxbf_tmfifo_vring *)vq->priv;
+	fifo = vring->fifo;
+
+	/*
+	 * Virtio maintains vrings in pairs, even number ring for Rx
+	 * and odd number ring for Tx.
+	 */
+	if (!(vring->id & 1)) {
+		/* Set the RX HWM bit to start Rx. */
+		if (!test_and_set_bit(MLXBF_TM_RX_HWM_IRQ, &fifo->pend_events))
+			schedule_work(&fifo->work);
+	} else {
+		/*
+		 * Console could make blocking call with interrupts disabled.
+		 * In such case, the vring needs to be served right away. For
+		 * other cases, just set the TX LWM bit to start Tx in the
+		 * worker handler.
+		 */
+		if (vring->vdev_id == VIRTIO_ID_CONSOLE) {
+			spin_lock_irqsave(&fifo->spin_lock, flags);
+			mlxbf_tmfifo_console_output(
+				fifo->vdev[VIRTIO_ID_CONSOLE], vq);
+			spin_unlock_irqrestore(&fifo->spin_lock, flags);
+			schedule_work(&fifo->work);
+		} else if (!test_and_set_bit(MLXBF_TM_TX_LWM_IRQ,
+					     &fifo->pend_events))
+			schedule_work(&fifo->work);
+	}
+
+	return true;
+}
+
+/* Work handler for Rx and Tx case. */
+static void mlxbf_tmfifo_work_handler(struct work_struct *work)
+{
+	struct mlxbf_tmfifo_vdev *tm_vdev;
+	struct mlxbf_tmfifo *fifo;
+	int i;
+
+	fifo = container_of(work, struct mlxbf_tmfifo, work);
+	if (!fifo->is_ready)
+		return;
+
+	mutex_lock(&fifo->lock);
+
+	/* Tx (Send data to the TmFifo). */
+	if (test_and_clear_bit(MLXBF_TM_TX_LWM_IRQ, &fifo->pend_events) &&
+		       fifo->irq_info[MLXBF_TM_TX_LWM_IRQ].irq) {
+		for (i = 0; i < MLXBF_TMFIFO_VDEV_MAX; i++) {
+			tm_vdev = fifo->vdev[i];
+			if (tm_vdev != NULL) {
+				mlxbf_tmfifo_virtio_rxtx(
+				    tm_vdev->vrings[MLXBF_TMFIFO_VRING_TX].vq,
+				    false);
+			}
+		}
+	}
+
+	/* Rx (Receive data from the TmFifo). */
+	if (test_and_clear_bit(MLXBF_TM_RX_HWM_IRQ, &fifo->pend_events) &&
+		       fifo->irq_info[MLXBF_TM_RX_HWM_IRQ].irq) {
+		for (i = 0; i < MLXBF_TMFIFO_VDEV_MAX; i++) {
+			tm_vdev = fifo->vdev[i];
+			if (tm_vdev != NULL) {
+				mlxbf_tmfifo_virtio_rxtx(
+				    tm_vdev->vrings[MLXBF_TMFIFO_VRING_RX].vq,
+				    true);
+			}
+		}
+	}
+
+	mutex_unlock(&fifo->lock);
+}
+
+/* Get the array of feature bits for this device. */
+static u64 mlxbf_tmfifo_virtio_get_features(struct virtio_device *vdev)
+{
+	struct mlxbf_tmfifo_vdev *tm_vdev;
+
+	tm_vdev = container_of(vdev, struct mlxbf_tmfifo_vdev, vdev);
+	return tm_vdev->features;
+}
+
+/* Confirm device features to use. */
+static int mlxbf_tmfifo_virtio_finalize_features(struct virtio_device *vdev)
+{
+	struct mlxbf_tmfifo_vdev *tm_vdev;
+
+	tm_vdev = container_of(vdev, struct mlxbf_tmfifo_vdev, vdev);
+	tm_vdev->features = vdev->features;
+
+	return 0;
+}
+
+/* Free virtqueues found by find_vqs(). */
+static void mlxbf_tmfifo_virtio_del_vqs(struct virtio_device *vdev)
+{
+	struct mlxbf_tmfifo_vdev *tm_vdev;
+	struct mlxbf_tmfifo_vring *vring;
+	struct virtqueue *vq;
+	int i;
+
+	tm_vdev = container_of(vdev, struct mlxbf_tmfifo_vdev, vdev);
+
+	for (i = 0; i < ARRAY_SIZE(tm_vdev->vrings); i++) {
+		vring = &tm_vdev->vrings[i];
+
+		/* Release the pending packet. */
+		if (vring->desc != NULL) {
+			mlxbf_tmfifo_release_pkt(&tm_vdev->vdev, vring,
+						 &vring->desc);
+		}
+
+		vq = vring->vq;
+		if (vq) {
+			vring->vq = NULL;
+			vring_del_virtqueue(vq);
+		}
+	}
+}
+
+/* Create and initialize the virtual queues. */
+static int mlxbf_tmfifo_virtio_find_vqs(struct virtio_device *vdev,
+					unsigned int nvqs,
+					struct virtqueue *vqs[],
+					vq_callback_t *callbacks[],
+					const char * const names[],
+					const bool *ctx,
+					struct irq_affinity *desc)
+{
+	struct mlxbf_tmfifo_vdev *tm_vdev;
+	struct mlxbf_tmfifo_vring *vring;
+	int i, ret = -EINVAL, size;
+	struct virtqueue *vq;
+
+	tm_vdev = container_of(vdev, struct mlxbf_tmfifo_vdev, vdev);
+	if (nvqs > ARRAY_SIZE(tm_vdev->vrings))
+		return -EINVAL;
+
+	for (i = 0; i < nvqs; ++i) {
+		if (!names[i])
+			goto error;
+		vring = &tm_vdev->vrings[i];
+
+		/* zero vring */
+		size = vring_size(vring->size, vring->align);
+		memset(vring->va, 0, size);
+		vq = vring_new_virtqueue(i, vring->size, vring->align, vdev,
+					 false, false, vring->va,
+					 mlxbf_tmfifo_virtio_notify,
+					 callbacks[i], names[i]);
+		if (!vq) {
+			dev_err(&vdev->dev, "vring_new_virtqueue failed\n");
+			ret = -ENOMEM;
+			goto error;
+		}
+
+		vqs[i] = vq;
+		vring->vq = vq;
+		vq->priv = vring;
+	}
+
+	return 0;
+
+error:
+	mlxbf_tmfifo_virtio_del_vqs(vdev);
+	return ret;
+}
+
+/* Read the status byte. */
+static u8 mlxbf_tmfifo_virtio_get_status(struct virtio_device *vdev)
+{
+	struct mlxbf_tmfifo_vdev *tm_vdev;
+
+	tm_vdev = container_of(vdev, struct mlxbf_tmfifo_vdev, vdev);
+
+	return tm_vdev->status;
+}
+
+/* Write the status byte. */
+static void mlxbf_tmfifo_virtio_set_status(struct virtio_device *vdev,
+					   u8 status)
+{
+	struct mlxbf_tmfifo_vdev *tm_vdev;
+
+	tm_vdev = container_of(vdev, struct mlxbf_tmfifo_vdev, vdev);
+	tm_vdev->status = status;
+}
+
+/* Reset the device. Not much here for now. */
+static void mlxbf_tmfifo_virtio_reset(struct virtio_device *vdev)
+{
+	struct mlxbf_tmfifo_vdev *tm_vdev;
+
+	tm_vdev = container_of(vdev, struct mlxbf_tmfifo_vdev, vdev);
+	tm_vdev->status = 0;
+}
+
+/* Read the value of a configuration field. */
+static void mlxbf_tmfifo_virtio_get(struct virtio_device *vdev,
+			      unsigned int offset,
+			      void *buf,
+			      unsigned int len)
+{
+	struct mlxbf_tmfifo_vdev *tm_vdev;
+
+	tm_vdev = container_of(vdev, struct mlxbf_tmfifo_vdev, vdev);
+
+	if (offset + len > sizeof(tm_vdev->config) || offset + len < len) {
+		dev_err(vdev->dev.parent, "virtio_get access out of bounds\n");
+		return;
+	}
+
+	memcpy(buf, (u8 *)&tm_vdev->config + offset, len);
+}
+
+/* Write the value of a configuration field. */
+static void mlxbf_tmfifo_virtio_set(struct virtio_device *vdev,
+				 unsigned int offset,
+				 const void *buf,
+				 unsigned int len)
+{
+	struct mlxbf_tmfifo_vdev *tm_vdev;
+
+	tm_vdev = container_of(vdev, struct mlxbf_tmfifo_vdev, vdev);
+
+	if (offset + len > sizeof(tm_vdev->config) || offset + len < len) {
+		dev_err(vdev->dev.parent, "virtio_get access out of bounds\n");
+		return;
+	}
+
+	memcpy((u8 *)&tm_vdev->config + offset, buf, len);
+}
+
+/* Virtio config operations. */
+static const struct virtio_config_ops mlxbf_tmfifo_virtio_config_ops = {
+	.get_features = mlxbf_tmfifo_virtio_get_features,
+	.finalize_features = mlxbf_tmfifo_virtio_finalize_features,
+	.find_vqs = mlxbf_tmfifo_virtio_find_vqs,
+	.del_vqs = mlxbf_tmfifo_virtio_del_vqs,
+	.reset = mlxbf_tmfifo_virtio_reset,
+	.set_status = mlxbf_tmfifo_virtio_set_status,
+	.get_status = mlxbf_tmfifo_virtio_get_status,
+	.get = mlxbf_tmfifo_virtio_get,
+	.set = mlxbf_tmfifo_virtio_set,
+};
+
+/* Create vdev type in a tmfifo. */
+int mlxbf_tmfifo_create_vdev(struct mlxbf_tmfifo *fifo, int vdev_id,
+			     u64 features, void *config, u32 size)
+{
+	struct mlxbf_tmfifo_vdev *tm_vdev;
+	int ret = 0;
+
+	mutex_lock(&fifo->lock);
+
+	tm_vdev = fifo->vdev[vdev_id];
+	if (tm_vdev != NULL) {
+		pr_err("vdev %d already exists\n", vdev_id);
+		ret = -EEXIST;
+		goto already_exist;
+	}
+
+	tm_vdev = kzalloc(sizeof(*tm_vdev), GFP_KERNEL);
+	if (!tm_vdev) {
+		ret = -ENOMEM;
+		goto already_exist;
+	}
+
+	tm_vdev->vdev.id.device = vdev_id;
+	tm_vdev->vdev.config = &mlxbf_tmfifo_virtio_config_ops;
+	tm_vdev->vdev.dev.parent = &fifo->pdev->dev;
+	tm_vdev->vdev.dev.release = mlxbf_tmfifo_virtio_dev_release;
+	tm_vdev->features = features;
+	if (config)
+		memcpy(&tm_vdev->config, config, size);
+	if (mlxbf_tmfifo_alloc_vrings(fifo, tm_vdev, vdev_id)) {
+		pr_err("Unable to allocate vring\n");
+		ret = -ENOMEM;
+		goto alloc_vring_fail;
+	}
+	if (vdev_id == VIRTIO_ID_CONSOLE) {
+		tm_vdev->tx_buf = kmalloc(MLXBF_TMFIFO_CONS_TX_BUF_SIZE,
+					  GFP_KERNEL);
+	}
+	fifo->vdev[vdev_id] = tm_vdev;
+
+	/* Register the virtio device. */
+	ret = register_virtio_device(&tm_vdev->vdev);
+	if (ret) {
+		dev_err(&fifo->pdev->dev, "register_virtio_device() failed\n");
+		goto register_fail;
+	}
+
+	mutex_unlock(&fifo->lock);
+	return 0;
+
+register_fail:
+	mlxbf_tmfifo_free_vrings(fifo, vdev_id);
+	fifo->vdev[vdev_id] = NULL;
+alloc_vring_fail:
+	kfree(tm_vdev);
+already_exist:
+	mutex_unlock(&fifo->lock);
+	return ret;
+}
+
+/* Delete vdev type from a tmfifo. */
+int mlxbf_tmfifo_delete_vdev(struct mlxbf_tmfifo *fifo, int vdev_id)
+{
+	struct mlxbf_tmfifo_vdev *tm_vdev;
+
+	mutex_lock(&fifo->lock);
+
+	/* Unregister vdev. */
+	tm_vdev = fifo->vdev[vdev_id];
+	if (tm_vdev) {
+		unregister_virtio_device(&tm_vdev->vdev);
+		mlxbf_tmfifo_free_vrings(fifo, vdev_id);
+		kfree(tm_vdev->tx_buf);
+		kfree(tm_vdev);
+		fifo->vdev[vdev_id] = NULL;
+	}
+
+	mutex_unlock(&fifo->lock);
+
+	return 0;
+}
+
+/* Device remove function. */
+static int mlxbf_tmfifo_remove(struct platform_device *pdev)
+{
+	struct mlxbf_tmfifo *fifo = platform_get_drvdata(pdev);
+	struct resource *rx_res, *tx_res;
+	int i;
+
+	if (fifo) {
+		mutex_lock(&mlxbf_tmfifo_lock);
+
+		fifo->is_ready = false;
+
+		/* Stop the timer. */
+		del_timer_sync(&fifo->timer);
+
+		/* Release interrupts. */
+		mlxbf_tmfifo_free_irqs(fifo);
+
+		/* Cancel the pending work. */
+		cancel_work_sync(&fifo->work);
+
+		for (i = 0; i < MLXBF_TMFIFO_VDEV_MAX; i++)
+			mlxbf_tmfifo_delete_vdev(fifo, i);
+
+		/* Release IO resources. */
+		if (fifo->rx_base)
+			iounmap(fifo->rx_base);
+		if (fifo->tx_base)
+			iounmap(fifo->tx_base);
+
+		platform_set_drvdata(pdev, NULL);
+		kfree(fifo);
+
+		mutex_unlock(&mlxbf_tmfifo_lock);
+	}
+
+	rx_res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
+	if (rx_res)
+		release_mem_region(rx_res->start, resource_size(rx_res));
+	tx_res = platform_get_resource(pdev, IORESOURCE_MEM, 1);
+	if (tx_res)
+		release_mem_region(tx_res->start, resource_size(tx_res));
+
+	return 0;
+}
+
+/* Read the configured network MAC address from efi variable. */
+static void mlxbf_tmfifo_get_cfg_mac(u8 *mac)
+{
+	efi_char16_t name[] = {
+		'R', 's', 'h', 'i', 'm', 'M', 'a', 'c', 'A', 'd', 'd', 'r', 0 };
+	efi_guid_t guid = EFI_GLOBAL_VARIABLE_GUID;
+	efi_status_t status;
+	unsigned long size;
+	u8 buf[6];
+
+	size = sizeof(buf);
+	status = efi.get_variable(name, &guid, NULL, &size, buf);
+	if (status == EFI_SUCCESS && size == sizeof(buf))
+		memcpy(mac, buf, sizeof(buf));
+}
+
+/* Probe the TMFIFO. */
+static int mlxbf_tmfifo_probe(struct platform_device *pdev)
+{
+	struct virtio_net_config net_config;
+	struct resource *rx_res, *tx_res;
+	struct mlxbf_tmfifo *fifo;
+	int i, ret;
+	u64 ctl;
+
+	/* Get the resource of the Rx & Tx FIFO. */
+	rx_res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
+	tx_res = platform_get_resource(pdev, IORESOURCE_MEM, 1);
+	if (!rx_res || !tx_res) {
+		ret = -EINVAL;
+		goto err;
+	}
+
+	if (request_mem_region(rx_res->start,
+			       resource_size(rx_res), "bf-tmfifo") == NULL) {
+		ret = -EBUSY;
+		goto early_err;
+	}
+
+	if (request_mem_region(tx_res->start,
+			       resource_size(tx_res), "bf-tmfifo") == NULL) {
+		release_mem_region(rx_res->start, resource_size(rx_res));
+		ret = -EBUSY;
+		goto early_err;
+	}
+
+	ret = -ENOMEM;
+	fifo = kzalloc(sizeof(struct mlxbf_tmfifo), GFP_KERNEL);
+	if (!fifo)
+		goto err;
+
+	fifo->pdev = pdev;
+	platform_set_drvdata(pdev, fifo);
+
+	spin_lock_init(&fifo->spin_lock);
+	INIT_WORK(&fifo->work, mlxbf_tmfifo_work_handler);
+
+	timer_setup(&fifo->timer, mlxbf_tmfifo_timer, 0);
+	fifo->timer.function = mlxbf_tmfifo_timer;
+
+	for (i = 0; i < MLXBF_TM_IRQ_CNT; i++) {
+		fifo->irq_info[i].index = i;
+		fifo->irq_info[i].fifo = fifo;
+		fifo->irq_info[i].irq = platform_get_irq(pdev, i);
+		ret = request_irq(fifo->irq_info[i].irq,
+				  mlxbf_tmfifo_irq_handler, 0,
+				  "tmfifo", &fifo->irq_info[i]);
+		if (ret) {
+			pr_err("Unable to request irq\n");
+			fifo->irq_info[i].irq = 0;
+			goto err;
+		}
+	}
+
+	fifo->rx_base = ioremap(rx_res->start, resource_size(rx_res));
+	if (!fifo->rx_base)
+		goto err;
+
+	fifo->tx_base = ioremap(tx_res->start, resource_size(tx_res));
+	if (!fifo->tx_base)
+		goto err;
+
+	/* Get Tx FIFO size and set the low/high watermark. */
+	ctl = readq(fifo->tx_base + MLXBF_TMFIFO_TX_CTL);
+	fifo->tx_fifo_size =
+		FIELD_GET(MLXBF_TMFIFO_TX_CTL__MAX_ENTRIES_MASK, ctl);
+	ctl = (ctl & ~MLXBF_TMFIFO_TX_CTL__LWM_MASK) |
+		FIELD_PREP(MLXBF_TMFIFO_TX_CTL__LWM_MASK,
+			   fifo->tx_fifo_size / 2);
+	ctl = (ctl & ~MLXBF_TMFIFO_TX_CTL__HWM_MASK) |
+		FIELD_PREP(MLXBF_TMFIFO_TX_CTL__HWM_MASK,
+			   fifo->tx_fifo_size - 1);
+	writeq(ctl, fifo->tx_base + MLXBF_TMFIFO_TX_CTL);
+
+	/* Get Rx FIFO size and set the low/high watermark. */
+	ctl = readq(fifo->rx_base + MLXBF_TMFIFO_RX_CTL);
+	fifo->rx_fifo_size =
+		FIELD_GET(MLXBF_TMFIFO_RX_CTL__MAX_ENTRIES_MASK, ctl);
+	ctl = (ctl & ~MLXBF_TMFIFO_RX_CTL__LWM_MASK) |
+		FIELD_PREP(MLXBF_TMFIFO_RX_CTL__LWM_MASK, 0);
+	ctl = (ctl & ~MLXBF_TMFIFO_RX_CTL__HWM_MASK) |
+		FIELD_PREP(MLXBF_TMFIFO_RX_CTL__HWM_MASK, 1);
+	writeq(ctl, fifo->rx_base + MLXBF_TMFIFO_RX_CTL);
+
+	mutex_init(&fifo->lock);
+
+	/* Create the console vdev. */
+	ret = mlxbf_tmfifo_create_vdev(fifo, VIRTIO_ID_CONSOLE, 0, NULL, 0);
+	if (ret)
+		goto err;
+
+	/* Create the network vdev. */
+	memset(&net_config, 0, sizeof(net_config));
+	net_config.mtu = MLXBF_TMFIFO_NET_MTU;
+	net_config.status = VIRTIO_NET_S_LINK_UP;
+	memcpy(net_config.mac, mlxbf_tmfifo_net_default_mac, 6);
+	mlxbf_tmfifo_get_cfg_mac(net_config.mac);
+	ret = mlxbf_tmfifo_create_vdev(fifo, VIRTIO_ID_NET,
+		MLXBF_TMFIFO_NET_FEATURES, &net_config, sizeof(net_config));
+	if (ret)
+		goto err;
+
+	mod_timer(&fifo->timer, jiffies + mlxbf_tmfifo_timer_interval);
+
+	fifo->is_ready = true;
+
+	return 0;
+
+err:
+	mlxbf_tmfifo_remove(pdev);
+early_err:
+	dev_err(&pdev->dev, "Probe Failed\n");
+	return ret;
+}
+
+static const struct of_device_id mlxbf_tmfifo_match[] = {
+	{ .compatible = "mellanox,bf-tmfifo" },
+	{},
+};
+MODULE_DEVICE_TABLE(of, mlxbf_tmfifo_match);
+
+static const struct acpi_device_id mlxbf_tmfifo_acpi_match[] = {
+	{ "MLNXBF01", 0 },
+	{},
+};
+MODULE_DEVICE_TABLE(acpi, mlxbf_tmfifo_acpi_match);
+
+static struct platform_driver mlxbf_tmfifo_driver = {
+	.probe = mlxbf_tmfifo_probe,
+	.remove = mlxbf_tmfifo_remove,
+	.driver = {
+		.name = "bf-tmfifo",
+		.of_match_table = mlxbf_tmfifo_match,
+		.acpi_match_table = ACPI_PTR(mlxbf_tmfifo_acpi_match),
+	},
+};
+
+module_platform_driver(mlxbf_tmfifo_driver);
+
+MODULE_DESCRIPTION("Mellanox BlueField SoC TmFifo Driver");
+MODULE_LICENSE("GPL");
+MODULE_AUTHOR("Mellanox Technologies");
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v8 2/2] dt-bindings: soc: Add TmFifo binding for Mellanox BlueField SoC
       [not found] <b143b40446c1870fb8d422b364ead95d54552be9.1527264077.git.lsun@mellanox.com>
  2019-01-28 17:28 ` [PATCH v8 0/2] TmFifo platform driver for Mellanox BlueField SoC Liming Sun
  2019-01-28 17:28 ` [PATCH v8 1/2] platform/mellanox: Add TmFifo driver for Mellanox BlueField Soc Liming Sun
@ 2019-01-28 17:28 ` Liming Sun
  2019-02-13 13:27 ` [PATCH v9] platform/mellanox: Add TmFifo driver for Mellanox BlueField Soc Liming Sun
                   ` (8 subsequent siblings)
  11 siblings, 0 replies; 30+ messages in thread
From: Liming Sun @ 2019-01-28 17:28 UTC (permalink / raw)
  To: Rob Herring, Mark Rutland, Arnd Bergmann, David Woods,
	Andy Shevchenko, Darren Hart, Vadim Pasternak
  Cc: Liming Sun, devicetree, linux-kernel, platform-driver-x86

Add devicetree bindings for the TmFifo which is found on Mellanox
BlueField SoCs.

Reviewed-by: Rob Herring <robh@kernel.org>
Reviewed-by: David Woods <dwoods@mellanox.com>
Signed-off-by: Liming Sun <lsun@mellanox.com>
---
 .../devicetree/bindings/soc/mellanox/tmfifo.txt    | 23 ++++++++++++++++++++++
 1 file changed, 23 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/soc/mellanox/tmfifo.txt

diff --git a/Documentation/devicetree/bindings/soc/mellanox/tmfifo.txt b/Documentation/devicetree/bindings/soc/mellanox/tmfifo.txt
new file mode 100644
index 0000000..8a13fa6
--- /dev/null
+++ b/Documentation/devicetree/bindings/soc/mellanox/tmfifo.txt
@@ -0,0 +1,23 @@
+* Mellanox BlueField SoC TmFifo
+
+BlueField TmFifo provides a shared FIFO between the target and the
+external host machine, which can be accessed by external host via
+USB or PCIe. In the current tmfifo driver, this FIFO has been demuxed
+to implement virtual console and network interface based on the virtio
+framework.
+
+Required properties:
+
+- compatible:	Should be "mellanox,bf-tmfifo"
+- reg:		Physical base address and length of Rx/Tx block
+- interrupts:	The interrupt number of Rx low water mark, Rx high water mark
+		Tx low water mark, Tx high water mark respectively.
+
+Example:
+
+tmfifo@800a20 {
+	compatible = "mellanox,bf-tmfifo";
+	reg = <0x00800a20 0x00000018
+	       0x00800a40 0x00000018>;
+	interrupts = <41, 42, 43, 44>;
+};
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [PATCH v8 1/2] platform/mellanox: Add TmFifo driver for Mellanox BlueField Soc
  2019-01-28 17:28 ` [PATCH v8 1/2] platform/mellanox: Add TmFifo driver for Mellanox BlueField Soc Liming Sun
@ 2019-01-29 22:06   ` Andy Shevchenko
  2019-02-13 16:33     ` Liming Sun
  2019-01-30  6:24   ` Vadim Pasternak
  1 sibling, 1 reply; 30+ messages in thread
From: Andy Shevchenko @ 2019-01-29 22:06 UTC (permalink / raw)
  To: Liming Sun
  Cc: Rob Herring, Mark Rutland, Arnd Bergmann, David Woods,
	Andy Shevchenko, Darren Hart, Vadim Pasternak, devicetree,
	Linux Kernel Mailing List, Platform Driver

On Mon, Jan 28, 2019 at 7:28 PM Liming Sun <lsun@mellanox.com> wrote:
>
> This commit adds the TmFifo platform driver for Mellanox BlueField
> Soc. TmFifo is a shared FIFO which enables external host machine
> to exchange data with the SoC via USB or PCIe. The driver is based
> on virtio framework and has console and network access enabled.
>
> Reviewed-by: David Woods <dwoods@mellanox.com>
> Signed-off-by: Liming Sun <lsun@mellanox.com>


Please, go through this series taking into account review I just did
for your another patch.

On top of that, see recent (for few years I think) drivers what modern
APIs they are using, e.g. devm_.

-- 
With Best Regards,
Andy Shevchenko

^ permalink raw reply	[flat|nested] 30+ messages in thread

* RE: [PATCH v8 1/2] platform/mellanox: Add TmFifo driver for Mellanox BlueField Soc
  2019-01-28 17:28 ` [PATCH v8 1/2] platform/mellanox: Add TmFifo driver for Mellanox BlueField Soc Liming Sun
  2019-01-29 22:06   ` Andy Shevchenko
@ 2019-01-30  6:24   ` Vadim Pasternak
  1 sibling, 0 replies; 30+ messages in thread
From: Vadim Pasternak @ 2019-01-30  6:24 UTC (permalink / raw)
  To: Liming Sun, Rob Herring, Mark Rutland, Arnd Bergmann,
	David Woods, Andy Shevchenko, Darren Hart
  Cc: Liming Sun, devicetree, linux-kernel, platform-driver-x86



> -----Original Message-----
> From: Liming Sun <lsun@mellanox.com>
> Sent: Monday, January 28, 2019 7:28 PM
> To: Rob Herring <robh+dt@kernel.org>; Mark Rutland
> <mark.rutland@arm.com>; Arnd Bergmann <arnd@arndb.de>; David Woods
> <dwoods@mellanox.com>; Andy Shevchenko <andy@infradead.org>; Darren
> Hart <dvhart@infradead.org>; Vadim Pasternak <vadimp@mellanox.com>
> Cc: Liming Sun <lsun@mellanox.com>; devicetree@vger.kernel.org; linux-
> kernel@vger.kernel.org; platform-driver-x86@vger.kernel.org
> Subject: [PATCH v8 1/2] platform/mellanox: Add TmFifo driver for Mellanox
> BlueField Soc
> 
> This commit adds the TmFifo platform driver for Mellanox BlueField Soc. TmFifo
> is a shared FIFO which enables external host machine to exchange data with the
> SoC via USB or PCIe. The driver is based on virtio framework and has console
> and network access enabled.
> 
> Reviewed-by: David Woods <dwoods@mellanox.com>
> Signed-off-by: Liming Sun <lsun@mellanox.com>
> ---
>  drivers/platform/mellanox/Kconfig             |   13 +-
>  drivers/platform/mellanox/Makefile            |    1 +
>  drivers/platform/mellanox/mlxbf-tmfifo-regs.h |   67 ++
>  drivers/platform/mellanox/mlxbf-tmfifo.c      | 1289
> +++++++++++++++++++++++++
>  4 files changed, 1369 insertions(+), 1 deletion(-)  create mode 100644
> drivers/platform/mellanox/mlxbf-tmfifo-regs.h
>  create mode 100644 drivers/platform/mellanox/mlxbf-tmfifo.c
> 
> diff --git a/drivers/platform/mellanox/Kconfig
> b/drivers/platform/mellanox/Kconfig
> index cd8a908..a565070 100644
> --- a/drivers/platform/mellanox/Kconfig
> +++ b/drivers/platform/mellanox/Kconfig
> @@ -5,7 +5,7 @@
> 
>  menuconfig MELLANOX_PLATFORM
>  	bool "Platform support for Mellanox hardware"
> -	depends on X86 || ARM || COMPILE_TEST
> +	depends on X86 || ARM || ARM64 || COMPILE_TEST
>  	---help---
>  	  Say Y here to get to see options for platform support for
>  	  Mellanox systems. This option alone does not add any kernel code.
> @@ -34,4 +34,15 @@ config MLXREG_IO
>  	  to system resets operation, system reset causes monitoring and some
>  	  kinds of mux selection.
> 
> +config MLXBF_TMFIFO
> +	tristate "Mellanox BlueField SoC TmFifo platform driver"
> +	depends on ARM64

Why you make it dependent on ARM64?
Should not it work on any host, x86?

> +	default m

User who needs it should select this option.
No need default 'm'.

> +	select VIRTIO_CONSOLE
> +	select VIRTIO_NET
> +	help
> +	  Say y here to enable TmFifo support. The TmFifo driver provides
> +          platform driver support for the TmFifo which supports console
> +          and networking based on the virtio framework.
> +
>  endif # MELLANOX_PLATFORM
> diff --git a/drivers/platform/mellanox/Makefile
> b/drivers/platform/mellanox/Makefile
> index 57074d9c..f0c061d 100644
> --- a/drivers/platform/mellanox/Makefile
> +++ b/drivers/platform/mellanox/Makefile
> @@ -5,3 +5,4 @@
>  #
>  obj-$(CONFIG_MLXREG_HOTPLUG)	+= mlxreg-hotplug.o
>  obj-$(CONFIG_MLXREG_IO) += mlxreg-io.o
> +obj-$(CONFIG_MLXBF_TMFIFO)	+= mlxbf-tmfifo.o
> diff --git a/drivers/platform/mellanox/mlxbf-tmfifo-regs.h
> b/drivers/platform/mellanox/mlxbf-tmfifo-regs.h
> new file mode 100644
> index 0000000..90c9c2cf
> --- /dev/null
> +++ b/drivers/platform/mellanox/mlxbf-tmfifo-regs.h
> @@ -0,0 +1,67 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +/*
> + * Copyright (c) 2019, Mellanox Technologies. All rights reserved.
> + */
> +
> +#ifndef __MLXBF_TMFIFO_REGS_H__
> +#define __MLXBF_TMFIFO_REGS_H__
> +
> +#include <linux/types.h>
> +
> +#define MLXBF_TMFIFO_TX_DATA 0x0
> +
> +#define MLXBF_TMFIFO_TX_STS 0x8
> +#define MLXBF_TMFIFO_TX_STS__LENGTH 0x0001 #define
> +MLXBF_TMFIFO_TX_STS__COUNT_SHIFT 0 #define
> +MLXBF_TMFIFO_TX_STS__COUNT_WIDTH 9 #define
> +MLXBF_TMFIFO_TX_STS__COUNT_RESET_VAL 0 #define
> +MLXBF_TMFIFO_TX_STS__COUNT_RMASK 0x1ff #define
> +MLXBF_TMFIFO_TX_STS__COUNT_MASK  0x1ff
> +
> +#define MLXBF_TMFIFO_TX_CTL 0x10
> +#define MLXBF_TMFIFO_TX_CTL__LENGTH 0x0001 #define
> +MLXBF_TMFIFO_TX_CTL__LWM_SHIFT 0 #define
> MLXBF_TMFIFO_TX_CTL__LWM_WIDTH
> +8 #define MLXBF_TMFIFO_TX_CTL__LWM_RESET_VAL 128 #define
> +MLXBF_TMFIFO_TX_CTL__LWM_RMASK 0xff #define
> +MLXBF_TMFIFO_TX_CTL__LWM_MASK  0xff #define
> +MLXBF_TMFIFO_TX_CTL__HWM_SHIFT 8 #define
> MLXBF_TMFIFO_TX_CTL__HWM_WIDTH
> +8 #define MLXBF_TMFIFO_TX_CTL__HWM_RESET_VAL 128 #define
> +MLXBF_TMFIFO_TX_CTL__HWM_RMASK 0xff #define
> +MLXBF_TMFIFO_TX_CTL__HWM_MASK  0xff00 #define
> +MLXBF_TMFIFO_TX_CTL__MAX_ENTRIES_SHIFT 32 #define
> +MLXBF_TMFIFO_TX_CTL__MAX_ENTRIES_WIDTH 9 #define
> +MLXBF_TMFIFO_TX_CTL__MAX_ENTRIES_RESET_VAL 256 #define
> +MLXBF_TMFIFO_TX_CTL__MAX_ENTRIES_RMASK 0x1ff #define
> +MLXBF_TMFIFO_TX_CTL__MAX_ENTRIES_MASK  0x1ff00000000ULL
> +
> +#define MLXBF_TMFIFO_RX_DATA 0x0
> +
> +#define MLXBF_TMFIFO_RX_STS 0x8
> +#define MLXBF_TMFIFO_RX_STS__LENGTH 0x0001 #define
> +MLXBF_TMFIFO_RX_STS__COUNT_SHIFT 0 #define
> +MLXBF_TMFIFO_RX_STS__COUNT_WIDTH 9 #define
> +MLXBF_TMFIFO_RX_STS__COUNT_RESET_VAL 0 #define
> +MLXBF_TMFIFO_RX_STS__COUNT_RMASK 0x1ff #define
> +MLXBF_TMFIFO_RX_STS__COUNT_MASK  0x1ff
> +
> +#define MLXBF_TMFIFO_RX_CTL 0x10
> +#define MLXBF_TMFIFO_RX_CTL__LENGTH 0x0001 #define
> +MLXBF_TMFIFO_RX_CTL__LWM_SHIFT 0 #define
> MLXBF_TMFIFO_RX_CTL__LWM_WIDTH
> +8 #define MLXBF_TMFIFO_RX_CTL__LWM_RESET_VAL 128 #define
> +MLXBF_TMFIFO_RX_CTL__LWM_RMASK 0xff #define
> +MLXBF_TMFIFO_RX_CTL__LWM_MASK  0xff #define
> +MLXBF_TMFIFO_RX_CTL__HWM_SHIFT 8 #define
> MLXBF_TMFIFO_RX_CTL__HWM_WIDTH
> +8 #define MLXBF_TMFIFO_RX_CTL__HWM_RESET_VAL 128 #define
> +MLXBF_TMFIFO_RX_CTL__HWM_RMASK 0xff #define
> +MLXBF_TMFIFO_RX_CTL__HWM_MASK  0xff00 #define
> +MLXBF_TMFIFO_RX_CTL__MAX_ENTRIES_SHIFT 32 #define
> +MLXBF_TMFIFO_RX_CTL__MAX_ENTRIES_WIDTH 9 #define
> +MLXBF_TMFIFO_RX_CTL__MAX_ENTRIES_RESET_VAL 256 #define
> +MLXBF_TMFIFO_RX_CTL__MAX_ENTRIES_RMASK 0x1ff #define
> +MLXBF_TMFIFO_RX_CTL__MAX_ENTRIES_MASK  0x1ff00000000ULL
> +
> +#endif /* !defined(__MLXBF_TMFIFO_REGS_H__) */
> diff --git a/drivers/platform/mellanox/mlxbf-tmfifo.c
> b/drivers/platform/mellanox/mlxbf-tmfifo.c
> new file mode 100644
> index 0000000..c1afe47
> --- /dev/null
> +++ b/drivers/platform/mellanox/mlxbf-tmfifo.c
> @@ -0,0 +1,1289 @@
> +// SPDX-License-Identifier: GPL-2.0+
> +/*
> + * Mellanox BlueField SoC TmFifo driver
> + *
> + * Copyright (C) 2019 Mellanox Technologies  */
> +
> +#include <linux/acpi.h>
> +#include <linux/bitfield.h>
> +#include <linux/cache.h>
> +#include <linux/device.h>
> +#include <linux/dma-mapping.h>
> +#include <linux/efi.h>
> +#include <linux/io.h>
> +#include <linux/interrupt.h>
> +#include <linux/irq.h>
> +#include <linux/kernel.h>
> +#include <linux/math64.h>
> +#include <linux/module.h>
> +#include <linux/moduleparam.h>
> +#include <linux/mutex.h>
> +#include <linux/platform_device.h>
> +#include <linux/resource.h>
> +#include <linux/slab.h>
> +#include <linux/types.h>
> +#include <linux/version.h>
> +#include <linux/virtio.h>
> +#include <linux/virtio_config.h>
> +#include <linux/virtio_console.h>
> +#include <linux/virtio_ids.h>
> +#include <linux/virtio_net.h>
> +#include <linux/virtio_ring.h>
> +#include <asm/byteorder.h>

Is it must ti include from asm?
Could it be replaced with something like
#include <linux/byteorder/generic.h>

> +
> +#include "mlxbf-tmfifo-regs.h"
> +
> +/* Vring size. */
> +#define MLXBF_TMFIFO_VRING_SIZE			1024
> +
> +/* Console Tx buffer size. */
> +#define MLXBF_TMFIFO_CONS_TX_BUF_SIZE		(32 * 1024)
> +
> +/* House-keeping timer interval. */
> +static int mlxbf_tmfifo_timer_interval = HZ / 10;
> +
> +/* Global lock. */
> +static DEFINE_MUTEX(mlxbf_tmfifo_lock);
> +
> +/* Virtual devices sharing the TM FIFO. */
> +#define MLXBF_TMFIFO_VDEV_MAX		(VIRTIO_ID_CONSOLE + 1)
> +
> +/* Struct declaration. */
> +struct mlxbf_tmfifo;
> +
> +/* Structure to maintain the ring state. */ struct mlxbf_tmfifo_vring {
> +	void *va;			/* virtual address */
> +	dma_addr_t dma;			/* dma address */
> +	struct virtqueue *vq;		/* virtqueue pointer */
> +	struct vring_desc *desc;	/* current desc */
> +	struct vring_desc *desc_head;	/* current desc head */
> +	int cur_len;			/* processed len in current desc */
> +	int rem_len;			/* remaining length to be processed */
> +	int size;			/* vring size */
> +	int align;			/* vring alignment */
> +	int id;				/* vring id */
> +	int vdev_id;			/* TMFIFO_VDEV_xxx */
> +	u32 pkt_len;			/* packet total length */
> +	__virtio16 next_avail;		/* next avail desc id */
> +	struct mlxbf_tmfifo *fifo;	/* pointer back to the tmfifo */
> +};
> +
> +/* Interrupt types. */
> +enum {
> +	MLXBF_TM_RX_LWM_IRQ,		/* Rx low water mark irq */
> +	MLXBF_TM_RX_HWM_IRQ,		/* Rx high water mark irq */
> +	MLXBF_TM_TX_LWM_IRQ,		/* Tx low water mark irq */
> +	MLXBF_TM_TX_HWM_IRQ,		/* Tx high water mark irq */
> +	MLXBF_TM_IRQ_CNT
> +};
> +
> +/* Ring types (Rx & Tx). */
> +enum {
> +	MLXBF_TMFIFO_VRING_RX,		/* Rx ring */
> +	MLXBF_TMFIFO_VRING_TX,		/* Tx ring */
> +	MLXBF_TMFIFO_VRING_NUM
> +};
> +
> +struct mlxbf_tmfifo_vdev {
> +	struct virtio_device vdev;	/* virtual device */
> +	u8 status;
> +	u64 features;
> +	union {				/* virtio config space */
> +		struct virtio_console_config cons;
> +		struct virtio_net_config net;
> +	} config;
> +	struct mlxbf_tmfifo_vring vrings[MLXBF_TMFIFO_VRING_NUM];
> +	u8 *tx_buf;			/* tx buffer */
> +	u32 tx_head;			/* tx buffer head */
> +	u32 tx_tail;			/* tx buffer tail */
> +};
> +
> +struct mlxbf_tmfifo_irq_info {
> +	struct mlxbf_tmfifo *fifo;	/* tmfifo structure */
> +	int irq;			/* interrupt number */
> +	int index;			/* array index */
> +};
> +
> +/* TMFIFO device structure */
> +struct mlxbf_tmfifo {
> +	struct mlxbf_tmfifo_vdev *vdev[MLXBF_TMFIFO_VDEV_MAX]; /*
> devices */
> +	struct platform_device *pdev;	/* platform device */
> +	struct mutex lock;		/* fifo lock */
> +	void __iomem *rx_base;		/* mapped register base */
> +	void __iomem *tx_base;		/* mapped register base */
> +	int tx_fifo_size;		/* number of entries of the Tx FIFO */
> +	int rx_fifo_size;		/* number of entries of the Rx FIFO */
> +	unsigned long pend_events;	/* pending bits for deferred process */
> +	struct mlxbf_tmfifo_irq_info irq_info[MLXBF_TM_IRQ_CNT]; /* irq info
> */
> +	struct work_struct work;	/* work struct for deferred process */
> +	struct timer_list timer;	/* keepalive timer */
> +	struct mlxbf_tmfifo_vring *vring[2];	/* current Tx/Rx ring */
> +	bool is_ready;			/* ready flag */
> +	spinlock_t spin_lock;		/* spin lock */
> +};
> +
> +union mlxbf_tmfifo_msg_hdr {
> +	struct {
> +		u8 type;		/* message type */
> +		__be16 len;		/* payload length */
> +		u8 unused[5];		/* reserved, set to 0 */
> +	} __packed;
> +	u64 data;
> +};
> +
> +/*
> + * Default MAC.
> + * This MAC address will be read from EFI persistent variable if configured.
> + * It can also be reconfigured with standard Linux tools.
> + */
> +static u8 mlxbf_tmfifo_net_default_mac[6] = {
> +	0x00, 0x1A, 0xCA, 0xFF, 0xFF, 0x01};
> +
> +/* MTU setting of the virtio-net interface. */
> +#define MLXBF_TMFIFO_NET_MTU		1500
> +
> +/* Maximum L2 header length. */
> +#define MLXBF_TMFIFO_NET_L2_OVERHEAD	36
> +
> +/* Supported virtio-net features. */
> +#define MLXBF_TMFIFO_NET_FEATURES	((1UL << VIRTIO_NET_F_MTU)
> | \
> +					 (1UL << VIRTIO_NET_F_STATUS) | \
> +					 (1UL << VIRTIO_NET_F_MAC))
> +
> +/* Return the consumed Tx buffer space. */ static int
> +mlxbf_tmfifo_vdev_tx_buf_len(struct mlxbf_tmfifo_vdev *vdev) {
> +	return ((vdev->tx_tail >= vdev->tx_head) ?
> +	       (vdev->tx_tail - vdev->tx_head) :
> +	       (MLXBF_TMFIFO_CONS_TX_BUF_SIZE - vdev->tx_head +
> +vdev->tx_tail)); }

I would suggest to split the above. 

> +
> +/* Return the available Tx buffer space. */ static int
> +mlxbf_tmfifo_vdev_tx_buf_avail(struct mlxbf_tmfifo_vdev *vdev) {
> +	return (MLXBF_TMFIFO_CONS_TX_BUF_SIZE - 8 -

Thins about some extra define for
"MLXBF_TMFIFO_CONS_TX_BUF_SIZE - 8"

> +		mlxbf_tmfifo_vdev_tx_buf_len(vdev));
> +}
> +
> +/* Update Tx buffer pointer after pushing data. */ static void
> +mlxbf_tmfifo_vdev_tx_buf_push(struct mlxbf_tmfifo_vdev *vdev,
> +					  u32 len)
> +{
> +	vdev->tx_tail += len;
> +	if (vdev->tx_tail >= MLXBF_TMFIFO_CONS_TX_BUF_SIZE)
> +		vdev->tx_tail -= MLXBF_TMFIFO_CONS_TX_BUF_SIZE; }
> +
> +/* Update Tx buffer pointer after popping data. */ static void
> +mlxbf_tmfifo_vdev_tx_buf_pop(struct mlxbf_tmfifo_vdev *vdev,
> +					 u32 len)
> +{
> +	vdev->tx_head += len;
> +	if (vdev->tx_head >= MLXBF_TMFIFO_CONS_TX_BUF_SIZE)
> +		vdev->tx_head -= MLXBF_TMFIFO_CONS_TX_BUF_SIZE; }
> +
> +/* Allocate vrings for the fifo. */
> +static int mlxbf_tmfifo_alloc_vrings(struct mlxbf_tmfifo *fifo,
> +				     struct mlxbf_tmfifo_vdev *tm_vdev,
> +				     int vdev_id)
> +{
> +	struct mlxbf_tmfifo_vring *vring;
> +	dma_addr_t dma;
> +	int i, size;
> +	void *va;
> +
> +	for (i = 0; i < ARRAY_SIZE(tm_vdev->vrings); i++) {
> +		vring = &tm_vdev->vrings[i];
> +		vring->fifo = fifo;
> +		vring->size = MLXBF_TMFIFO_VRING_SIZE;
> +		vring->align = SMP_CACHE_BYTES;
> +		vring->id = i;
> +		vring->vdev_id = vdev_id;
> +
> +		size = PAGE_ALIGN(vring_size(vring->size, vring->align));
> +		va = dma_alloc_coherent(tm_vdev->vdev.dev.parent, size,
> &dma,
> +					GFP_KERNEL);
> +		if (!va) {
> +			dev_err(tm_vdev->vdev.dev.parent,
> +				"vring allocation failed\n");
> +			return -EINVAL;
> +		}
> +
> +		vring->va = va;
> +		vring->dma = dma;
> +	}
> +
> +	return 0;
> +}
> +
> +/* Free vrings of the fifo device. */
> +static void mlxbf_tmfifo_free_vrings(struct mlxbf_tmfifo *fifo, int
> +vdev_id) {
> +	struct mlxbf_tmfifo_vdev *tm_vdev = fifo->vdev[vdev_id];
> +	struct mlxbf_tmfifo_vring *vring;
> +	int i, size;
> +
> +	for (i = 0; i < ARRAY_SIZE(tm_vdev->vrings); i++) {
> +		vring = &tm_vdev->vrings[i];
> +
> +		size = PAGE_ALIGN(vring_size(vring->size, vring->align));
> +		if (vring->va) {
> +			dma_free_coherent(tm_vdev->vdev.dev.parent, size,
> +					  vring->va, vring->dma);
> +			vring->va = NULL;
> +			if (vring->vq) {
> +				vring_del_virtqueue(vring->vq);
> +				vring->vq = NULL;
> +			}
> +		}
> +	}
> +}
> +
> +/* Free interrupts of the fifo device. */ static void
> +mlxbf_tmfifo_free_irqs(struct mlxbf_tmfifo *fifo) {
> +	int i, irq;
> +
> +	for (i = 0; i < MLXBF_TM_IRQ_CNT; i++) {
> +		irq = fifo->irq_info[i].irq;
> +		if (irq) {
> +			fifo->irq_info[i].irq = 0;
> +			disable_irq(irq);
> +			free_irq(irq, (u8 *)fifo + i);
> +		}
> +	}
> +}
> +
> +/* Interrupt handler. */
> +static irqreturn_t mlxbf_tmfifo_irq_handler(int irq, void *arg) {
> +	struct mlxbf_tmfifo_irq_info *irq_info;
> +
> +	irq_info = (struct mlxbf_tmfifo_irq_info *)arg;
> +
> +	if (irq_info->index < MLXBF_TM_IRQ_CNT &&
> +	    !test_and_set_bit(irq_info->index, &irq_info->fifo->pend_events))
> +		schedule_work(&irq_info->fifo->work);
> +
> +	return IRQ_HANDLED;
> +}
> +
> +/* Nothing to do for now. */
> +static void mlxbf_tmfifo_virtio_dev_release(struct device *dev) { }

If there is nothing to do - no reason to have it.

> +
> +/* Get the next packet descriptor from the vring. */ static inline
> +struct vring_desc * mlxbf_tmfifo_virtio_get_next_desc(struct virtqueue
> +*vq) {
> +	struct mlxbf_tmfifo_vring *vring;
> +	unsigned int idx, head;
> +	struct vring *vr;
> +
> +	vring = (struct mlxbf_tmfifo_vring *)vq->priv;
> +	vr = (struct vring *)virtqueue_get_vring(vq);
> +
> +	if (!vr || vring->next_avail == vr->avail->idx)
> +		return NULL;
> +
> +	idx = vring->next_avail % vr->num;
> +	head = vr->avail->ring[idx];
> +	BUG_ON(head >= vr->num);
> +	vring->next_avail++;
> +	return &vr->desc[head];
> +}
> +
> +static inline void mlxbf_tmfifo_virtio_release_desc(
> +	struct virtio_device *vdev, struct vring *vr,
> +	struct vring_desc *desc, u32 len)
> +{
> +	unsigned int idx;
> +
> +	idx = vr->used->idx % vr->num;
> +	vr->used->ring[idx].id = desc - vr->desc;
> +	vr->used->ring[idx].len = cpu_to_virtio32(vdev, len);
> +
> +	/* Virtio could poll and check the 'idx' to decide
> +	 * whether the desc is done or not. Add a memory
> +	 * barrier here to make sure the update above completes
> +	 * before updating the idx.
> +	 */
> +	mb();
> +	vr->used->idx++;
> +}
> +
> +/* Get the total length of a descriptor chain. */ static inline u32
> +mlxbf_tmfifo_virtio_get_pkt_len(struct virtio_device *vdev,
> +						  struct vring_desc *desc,
> +						  struct vring *vr)
> +{
> +	u32 len = 0, idx;
> +
> +	while (desc) {
> +		len += virtio32_to_cpu(vdev, desc->len);
> +		if (!(virtio16_to_cpu(vdev, desc->flags) &
> VRING_DESC_F_NEXT))
> +			break;
> +		idx = virtio16_to_cpu(vdev, desc->next);
> +		desc = &vr->desc[idx];
> +	}
> +
> +	return len;
> +}
> +
> +static void mlxbf_tmfifo_release_pkt(struct virtio_device *vdev,
> +				     struct mlxbf_tmfifo_vring *vring,
> +				     struct vring_desc **desc)
> +{
> +	struct vring *vr = (struct vring *)virtqueue_get_vring(vring->vq);
> +	struct vring_desc *desc_head;
> +	uint32_t pkt_len = 0;
> +
> +	if (!vr)
> +		return;
> +
> +	if (desc != NULL && *desc != NULL && vring->desc_head != NULL) {
> +		desc_head = vring->desc_head;
> +		pkt_len = vring->pkt_len;
> +	} else {
> +		desc_head = mlxbf_tmfifo_virtio_get_next_desc(vring->vq);
> +		if (desc_head != NULL) {
> +			pkt_len = mlxbf_tmfifo_virtio_get_pkt_len(
> +				vdev, desc_head, vr);
> +		}
> +	}
> +
> +	if (desc_head != NULL)
> +		mlxbf_tmfifo_virtio_release_desc(vdev, vr, desc_head,
> pkt_len);
> +
> +	if (desc != NULL)
> +		*desc = NULL;
> +	vring->pkt_len = 0;
> +}
> +
> +/* House-keeping timer. */
> +static void mlxbf_tmfifo_timer(struct timer_list *arg) {
> +	struct mlxbf_tmfifo *fifo;
> +
> +	fifo = container_of(arg, struct mlxbf_tmfifo, timer);
> +
> +	/*
> +	 * Wake up the work handler to poll the Rx FIFO in case interrupt
> +	 * missing or any leftover bytes stuck in the FIFO.
> +	 */
> +	test_and_set_bit(MLXBF_TM_RX_HWM_IRQ, &fifo->pend_events);
> +
> +	/*
> +	 * Wake up Tx handler in case virtio has queued too many packets
> +	 * and are waiting for buffer return.
> +	 */
> +	test_and_set_bit(MLXBF_TM_TX_LWM_IRQ, &fifo->pend_events);
> +
> +	schedule_work(&fifo->work);
> +
> +	mod_timer(&fifo->timer, jiffies + mlxbf_tmfifo_timer_interval); }
> +
> +/* Buffer the console output. */
> +static void mlxbf_tmfifo_console_output(struct mlxbf_tmfifo_vdev *cons,
> +					struct virtqueue *vq)
> +{
> +	struct vring *vr = (struct vring *)virtqueue_get_vring(vq);
> +	struct vring_desc *head_desc, *desc = NULL;
> +	struct virtio_device *vdev = &cons->vdev;
> +	u32 len, pkt_len, idx;
> +	void *addr;
> +
> +	for (;;) {

It's better to modify it as while(on some condition)

> +		head_desc = mlxbf_tmfifo_virtio_get_next_desc(vq);
> +		if (head_desc == NULL)
> +			break;
> +
> +		/* Release the packet if no more space. */
> +		pkt_len = mlxbf_tmfifo_virtio_get_pkt_len(vdev, head_desc,
> vr);
> +		if (pkt_len > mlxbf_tmfifo_vdev_tx_buf_avail(cons)) {
> +			mlxbf_tmfifo_virtio_release_desc(vdev, vr, head_desc,
> +							 pkt_len);

Why do you break line here?

> +			break;
> +		}
> +
> +		desc = head_desc;
> +
> +		while (desc != NULL) {
> +			addr = phys_to_virt(virtio64_to_cpu(vdev, desc->addr));
> +			len = virtio32_to_cpu(vdev, desc->len);
> +
> +			if (len <= MLXBF_TMFIFO_CONS_TX_BUF_SIZE -
> +			    cons->tx_tail) {

Why do you break line here? Also below I see few strange breaks.

> +				memcpy(cons->tx_buf + cons->tx_tail, addr,
> len);
> +			} else {
> +				u32 seg;
> +
> +				seg = MLXBF_TMFIFO_CONS_TX_BUF_SIZE -
> +					cons->tx_tail;
> +				memcpy(cons->tx_buf + cons->tx_tail, addr,
> seg);
> +				addr += seg;
> +				memcpy(cons->tx_buf, addr, len - seg);
> +			}
> +			mlxbf_tmfifo_vdev_tx_buf_push(cons, len);
> +
> +			if (!(virtio16_to_cpu(vdev, desc->flags) &
> +			    VRING_DESC_F_NEXT))
> +				break;
> +			idx = virtio16_to_cpu(vdev, desc->next);
> +			desc = &vr->desc[idx];
> +		}
> +
> +		mlxbf_tmfifo_virtio_release_desc(vdev, vr, head_desc,
> pkt_len);
> +	}
> +}
> +
> +/* Rx & Tx processing of a virtual queue. */ static void
> +mlxbf_tmfifo_virtio_rxtx(struct virtqueue *vq, bool is_rx) {
> +	int num_avail = 0, hdr_len, tx_reserve;
> +	struct mlxbf_tmfifo_vring *vring;
> +	struct mlxbf_tmfifo_vdev *cons;
> +	struct virtio_device *vdev;
> +	struct mlxbf_tmfifo *fifo;
> +	struct vring_desc *desc;
> +	unsigned long flags;
> +	struct vring *vr;
> +	u64 sts, data;
> +	u32 len, idx;
> +	void *addr;
> +
> +	if (!vq)
> +		return;
> +
> +	vring = (struct mlxbf_tmfifo_vring *)vq->priv;
> +	fifo = vring->fifo;
> +	vr = (struct vring *)virtqueue_get_vring(vq);
> +
> +	if (!fifo->vdev[vring->vdev_id])
> +		return;
> +	vdev = &fifo->vdev[vring->vdev_id]->vdev;
> +	cons = fifo->vdev[VIRTIO_ID_CONSOLE];
> +
> +	/* Don't continue if another vring is running. */
> +	if (fifo->vring[is_rx] != NULL && fifo->vring[is_rx] != vring)
> +		return;
> +
> +	/* tx_reserve is used to reserved some room in FIFO for console. */
> +	if (vring->vdev_id == VIRTIO_ID_NET) {
> +		hdr_len = sizeof(struct virtio_net_hdr);
> +		tx_reserve = fifo->tx_fifo_size / 16;

Use some define instead of 16/

> +	} else {
> +		BUG_ON(vring->vdev_id != VIRTIO_ID_CONSOLE);
> +		hdr_len = 0;
> +		tx_reserve = 1;
> +	}
> +
> +	desc = vring->desc;
> +
> +	while (1) {

I see there are few drivers in platform which use while (1)
But it looks better to use while(some condition)
and instead of break change this condition to false.

> +		/* Get available FIFO space. */
> +		if (num_avail == 0) {
> +			if (is_rx) {
> +				/* Get the number of available words in FIFO.
> */
> +				sts = readq(fifo->rx_base +
> +					    MLXBF_TMFIFO_RX_STS);
> +				num_avail = FIELD_GET(
> +
> 	MLXBF_TMFIFO_RX_STS__COUNT_MASK, sts);

				num_avail = FIELD_GET(TMFIFO_RX_STS__COUNT_MASK, sts);

> +
> +				/* Don't continue if nothing in FIFO. */
> +				if (num_avail <= 0)
> +					break;
> +			} else {
> +				/* Get available space in FIFO. */
> +				sts = readq(fifo->tx_base +
> +					    MLXBF_TMFIFO_TX_STS);
> +				num_avail = fifo->tx_fifo_size - tx_reserve -
> +					FIELD_GET(
> +
> 	MLXBF_TMFIFO_TX_STS__COUNT_MASK,
> +						sts);

Same as above.

> +
> +				if (num_avail <= 0)
> +					break;
> +			}
> +		}
> +
> +		/* Console output always comes from the Tx buffer. */
> +		if (!is_rx && vring->vdev_id == VIRTIO_ID_CONSOLE &&
> +		    cons != NULL && cons->tx_buf != NULL) {
> +			union mlxbf_tmfifo_msg_hdr hdr;
> +			int size;
> +
> +			size = mlxbf_tmfifo_vdev_tx_buf_len(cons);
> +			if (num_avail < 2 || size == 0)
> +				return;
> +			if (size + sizeof(hdr) > num_avail * sizeof(u64))
> +				size = num_avail * sizeof(u64) - sizeof(hdr);
> +			/* Write header. */
> +			hdr.data = 0;
> +			hdr.type = VIRTIO_ID_CONSOLE;
> +			hdr.len = htons(size);
> +			writeq(cpu_to_le64(hdr.data),
> +			       fifo->tx_base + MLXBF_TMFIFO_TX_DATA);
> +
> +			spin_lock_irqsave(&fifo->spin_lock, flags);
> +			while (size > 0) {
> +				addr = cons->tx_buf + cons->tx_head;
> +
> +				if (cons->tx_head + sizeof(u64) <=
> +				    MLXBF_TMFIFO_CONS_TX_BUF_SIZE) {
> +					memcpy(&data, addr, sizeof(u64));
> +				} else {
> +					int partial;
> +
> +					partial =
> +
> 	MLXBF_TMFIFO_CONS_TX_BUF_SIZE -
> +						cons->tx_head;
> +
> +					memcpy(&data, addr, partial);
> +					memcpy((u8 *)&data + partial,
> +					       cons->tx_buf,
> +					       sizeof(u64) - partial);
> +				}
> +				writeq(data,
> +				       fifo->tx_base + MLXBF_TMFIFO_TX_DATA);
> +
> +				if (size >= sizeof(u64)) {
> +					mlxbf_tmfifo_vdev_tx_buf_pop(
> +						cons, sizeof(u64));
> +					size -= sizeof(u64);
> +				} else {
> +					mlxbf_tmfifo_vdev_tx_buf_pop(
> +						cons, size);
> +					size = 0;
> +				}
> +			}
> +			spin_unlock_irqrestore(&fifo->spin_lock, flags);
> +			return;
> +		}
> +
> +		/* Get the desc of next packet. */
> +		if (!desc) {
> +			/* Save the head desc of the chain. */
> +			vring->desc_head =
> +				mlxbf_tmfifo_virtio_get_next_desc(vq);
> +			if (!vring->desc_head) {
> +				vring->desc = NULL;
> +				return;
> +			}
> +			desc = vring->desc_head;
> +			vring->desc = desc;
> +
> +			if (is_rx && vring->vdev_id == VIRTIO_ID_NET) {
> +				struct virtio_net_hdr *net_hdr;
> +
> +				/* Initialize the packet header. */
> +				net_hdr = (struct virtio_net_hdr *)
> +					phys_to_virt(virtio64_to_cpu(
> +						vdev, desc->addr));
> +				memset(net_hdr, 0, sizeof(*net_hdr));
> +			}
> +		}
> +
> +		/* Beginning of each packet. */
> +		if (vring->pkt_len == 0) {
> +			int vdev_id, vring_change = 0;
> +			union mlxbf_tmfifo_msg_hdr hdr;
> +
> +			num_avail--;
> +
> +			/* Read/Write packet length. */
> +			if (is_rx) {
> +				hdr.data = readq(fifo->rx_base +
> +						 MLXBF_TMFIFO_RX_DATA);
> +				hdr.data = le64_to_cpu(hdr.data);
> +
> +				/* Skip the length 0 packet (keepalive). */
> +				if (hdr.len == 0)
> +					continue;
> +
> +				/* Check packet type. */
> +				if (hdr.type == VIRTIO_ID_NET) {
> +					struct virtio_net_config *config;
> +
> +					vdev_id = VIRTIO_ID_NET;
> +					hdr_len = sizeof(struct virtio_net_hdr);
> +					config =
> +					    &fifo->vdev[vdev_id]->config.net;
> +					if (ntohs(hdr.len) > config->mtu +
> +
> 	MLXBF_TMFIFO_NET_L2_OVERHEAD)
> +						continue;
> +				} else if (hdr.type == VIRTIO_ID_CONSOLE) {
> +					vdev_id = VIRTIO_ID_CONSOLE;
> +					hdr_len = 0;
> +				} else {
> +					continue;
> +				}
> +
> +				/*
> +				 * Check whether the new packet still belongs
> +				 * to this vring or not. If not, update the
> +				 * pkt_len of the new vring and return.
> +				 */
> +				if (vdev_id != vring->vdev_id) {
> +					struct mlxbf_tmfifo_vdev *dev2 =
> +						fifo->vdev[vdev_id];
> +
> +					if (!dev2)
> +						break;
> +					vring->desc = desc;
> +					vring =
> +					  &dev2-
> >vrings[MLXBF_TMFIFO_VRING_RX];
> +					vring_change = 1;
> +				}
> +				vring->pkt_len = ntohs(hdr.len) + hdr_len;
> +			} else {
> +				vring->pkt_len =
> +					mlxbf_tmfifo_virtio_get_pkt_len(
> +						vdev, desc, vr);
> +
> +				hdr.data = 0;
> +				hdr.type = (vring->vdev_id == VIRTIO_ID_NET) ?
> +					VIRTIO_ID_NET :
> +					VIRTIO_ID_CONSOLE;
> +				hdr.len = htons(vring->pkt_len - hdr_len);
> +				writeq(cpu_to_le64(hdr.data),
> +				       fifo->tx_base + MLXBF_TMFIFO_TX_DATA);
> +			}
> +
> +			vring->cur_len = hdr_len;
> +			vring->rem_len = vring->pkt_len;
> +			fifo->vring[is_rx] = vring;
> +
> +			if (vring_change)
> +				return;
> +			continue;
> +		}
> +
> +		/* Check available space in this desc. */
> +		len = virtio32_to_cpu(vdev, desc->len);
> +		if (len > vring->rem_len)
> +			len = vring->rem_len;
> +
> +		/* Check if the current desc is already done. */
> +		if (vring->cur_len == len)
> +			goto check_done;
> +
> +		addr = phys_to_virt(virtio64_to_cpu(vdev, desc->addr));
> +
> +		/* Read a word from FIFO for Rx. */
> +		if (is_rx) {
> +			data = readq(fifo->rx_base +
> MLXBF_TMFIFO_RX_DATA);
> +			data = le64_to_cpu(data);
> +		}
> +
> +		if (vring->cur_len + sizeof(u64) <= len) {
> +			/* The whole word. */
> +			if (is_rx) {
> +				memcpy(addr + vring->cur_len, &data,
> +				       sizeof(u64));
> +			} else {
> +				memcpy(&data, addr + vring->cur_len,
> +				       sizeof(u64));
> +			}

Why not just.
Also few places like this one below.

			if (is_rx)
				memcpy(addr + vring->cur_len, &data, sizeof(u64));
			else
				memcpy(&data, addr + vring->cur_len, sizeof(u64));

> +			vring->cur_len += sizeof(u64);
> +		} else {
> +			/* Leftover bytes. */
> +			BUG_ON(vring->cur_len > len);
> +			if (is_rx) {
> +				memcpy(addr + vring->cur_len, &data,
> +				       len - vring->cur_len);
> +			} else {
> +				memcpy(&data, addr + vring->cur_len,
> +				       len - vring->cur_len);
> +			}
> +			vring->cur_len = len;
> +		}
> +
> +		/* Write the word into FIFO for Tx. */
> +		if (!is_rx) {
> +			writeq(cpu_to_le64(data),
> +			       fifo->tx_base + MLXBF_TMFIFO_TX_DATA);
> +		}
> +
> +		num_avail--;
> +
> +check_done:
> +		/* Check whether this desc is full or completed. */
> +		if (vring->cur_len == len) {
> +			vring->cur_len = 0;
> +			vring->rem_len -= len;
> +
> +			/* Get the next desc on the chain. */
> +			if (vring->rem_len > 0 &&
> +			    (virtio16_to_cpu(vdev, desc->flags) &
> +						VRING_DESC_F_NEXT)) {
> +				idx = virtio16_to_cpu(vdev, desc->next);
> +				desc = &vr->desc[idx];
> +				continue;
> +			}
> +
> +			/* Done and release the desc. */
> +			mlxbf_tmfifo_release_pkt(vdev, vring, &desc);
> +			fifo->vring[is_rx] = NULL;
> +
> +			/* Notify upper layer that packet is done. */
> +			spin_lock_irqsave(&fifo->spin_lock, flags);
> +			vring_interrupt(0, vq);
> +			spin_unlock_irqrestore(&fifo->spin_lock, flags);
> +			continue;
> +		}
> +	}
> +
> +	/* Save the current desc. */
> +	vring->desc = desc;
> +}

I suggest to split mlxbf_tmfifo_virtio_rxtx() into few small routines.


> +
> +/* The notify function is called when new buffers are posted. */ static
> +bool mlxbf_tmfifo_virtio_notify(struct virtqueue *vq) {
> +	struct mlxbf_tmfifo_vring *vring;
> +	struct mlxbf_tmfifo *fifo;
> +	unsigned long flags;
> +
> +	vring = (struct mlxbf_tmfifo_vring *)vq->priv;
> +	fifo = vring->fifo;
> +
> +	/*
> +	 * Virtio maintains vrings in pairs, even number ring for Rx
> +	 * and odd number ring for Tx.
> +	 */
> +	if (!(vring->id & 1)) {
> +		/* Set the RX HWM bit to start Rx. */
> +		if (!test_and_set_bit(MLXBF_TM_RX_HWM_IRQ, &fifo-
> >pend_events))
> +			schedule_work(&fifo->work);
> +	} else {
> +		/*
> +		 * Console could make blocking call with interrupts disabled.
> +		 * In such case, the vring needs to be served right away. For
> +		 * other cases, just set the TX LWM bit to start Tx in the
> +		 * worker handler.
> +		 */
> +		if (vring->vdev_id == VIRTIO_ID_CONSOLE) {
> +			spin_lock_irqsave(&fifo->spin_lock, flags);
> +			mlxbf_tmfifo_console_output(
> +				fifo->vdev[VIRTIO_ID_CONSOLE], vq);

			mlxbf_tmfifo_console_output(fifo->vdev[VIRTIO_ID_CONSOLE], vq);

> +			spin_unlock_irqrestore(&fifo->spin_lock, flags);
> +			schedule_work(&fifo->work);
> +		} else if (!test_and_set_bit(MLXBF_TM_TX_LWM_IRQ,
> +					     &fifo->pend_events))
> +			schedule_work(&fifo->work);

		If {
		} else if {
		}

For consistency.

> +	}
> +
> +	return true;
> +}
> +
> +/* Work handler for Rx and Tx case. */
> +static void mlxbf_tmfifo_work_handler(struct work_struct *work) {
> +	struct mlxbf_tmfifo_vdev *tm_vdev;
> +	struct mlxbf_tmfifo *fifo;
> +	int i;
> +
> +	fifo = container_of(work, struct mlxbf_tmfifo, work);
> +	if (!fifo->is_ready)
> +		return;
> +
> +	mutex_lock(&fifo->lock);
> +
> +	/* Tx (Send data to the TmFifo). */
> +	if (test_and_clear_bit(MLXBF_TM_TX_LWM_IRQ, &fifo->pend_events)
> &&
> +		       fifo->irq_info[MLXBF_TM_TX_LWM_IRQ].irq) {
> +		for (i = 0; i < MLXBF_TMFIFO_VDEV_MAX; i++) {

I suggest to define local variable vq.
And have below:
				mlxbf_tmfifo_virtio_rxtx(vq, false);

> +			tm_vdev = fifo->vdev[i];
> +			if (tm_vdev != NULL) {
> +				mlxbf_tmfifo_virtio_rxtx(
> +				    tm_vdev-
> >vrings[MLXBF_TMFIFO_VRING_TX].vq,
> +				    false);
> +			}
> +		}
> +	}
> +
> +	/* Rx (Receive data from the TmFifo). */
> +	if (test_and_clear_bit(MLXBF_TM_RX_HWM_IRQ, &fifo->pend_events)
> &&
> +		       fifo->irq_info[MLXBF_TM_RX_HWM_IRQ].irq) {
> +		for (i = 0; i < MLXBF_TMFIFO_VDEV_MAX; i++) {
> +			tm_vdev = fifo->vdev[i];

Same as above.

> +			if (tm_vdev != NULL) {
> +				mlxbf_tmfifo_virtio_rxtx(
> +				    tm_vdev-
> >vrings[MLXBF_TMFIFO_VRING_RX].vq,
> +				    true);
> +			}
> +		}
> +	}
> +
> +	mutex_unlock(&fifo->lock);
> +}
> +
> +/* Get the array of feature bits for this device. */ static u64
> +mlxbf_tmfifo_virtio_get_features(struct virtio_device *vdev) {
> +	struct mlxbf_tmfifo_vdev *tm_vdev;
> +
> +	tm_vdev = container_of(vdev, struct mlxbf_tmfifo_vdev, vdev);
> +	return tm_vdev->features;
> +}
> +
> +/* Confirm device features to use. */
> +static int mlxbf_tmfifo_virtio_finalize_features(struct virtio_device
> +*vdev) {
> +	struct mlxbf_tmfifo_vdev *tm_vdev;
> +
> +	tm_vdev = container_of(vdev, struct mlxbf_tmfifo_vdev, vdev);
> +	tm_vdev->features = vdev->features;
> +
> +	return 0;
> +}
> +
> +/* Free virtqueues found by find_vqs(). */ static void
> +mlxbf_tmfifo_virtio_del_vqs(struct virtio_device *vdev) {
> +	struct mlxbf_tmfifo_vdev *tm_vdev;
> +	struct mlxbf_tmfifo_vring *vring;
> +	struct virtqueue *vq;
> +	int i;
> +
> +	tm_vdev = container_of(vdev, struct mlxbf_tmfifo_vdev, vdev);
> +
> +	for (i = 0; i < ARRAY_SIZE(tm_vdev->vrings); i++) {
> +		vring = &tm_vdev->vrings[i];
> +
> +		/* Release the pending packet. */
> +		if (vring->desc != NULL) {
> +			mlxbf_tmfifo_release_pkt(&tm_vdev->vdev, vring,
> +						 &vring->desc);
> +		}
> +
> +		vq = vring->vq;
> +		if (vq) {
> +			vring->vq = NULL;
> +			vring_del_virtqueue(vq);
> +		}
> +	}
> +}
> +
> +/* Create and initialize the virtual queues. */ static int
> +mlxbf_tmfifo_virtio_find_vqs(struct virtio_device *vdev,
> +					unsigned int nvqs,
> +					struct virtqueue *vqs[],
> +					vq_callback_t *callbacks[],
> +					const char * const names[],
> +					const bool *ctx,
> +					struct irq_affinity *desc)
> +{
> +	struct mlxbf_tmfifo_vdev *tm_vdev;
> +	struct mlxbf_tmfifo_vring *vring;
> +	int i, ret = -EINVAL, size;

Don't initialize ret with -EINVAL.

> +	struct virtqueue *vq;
> +
> +	tm_vdev = container_of(vdev, struct mlxbf_tmfifo_vdev, vdev);
> +	if (nvqs > ARRAY_SIZE(tm_vdev->vrings))
> +		return -EINVAL;
> +
> +	for (i = 0; i < nvqs; ++i) {
> +		if (!names[i])
> +			goto error;
> +		vring = &tm_vdev->vrings[i];
> +
> +		/* zero vring */
> +		size = vring_size(vring->size, vring->align);
> +		memset(vring->va, 0, size);
> +		vq = vring_new_virtqueue(i, vring->size, vring->align, vdev,
> +					 false, false, vring->va,
> +					 mlxbf_tmfifo_virtio_notify,
> +					 callbacks[i], names[i]);
> +		if (!vq) {
> +			dev_err(&vdev->dev, "vring_new_virtqueue failed\n");
> +			ret = -ENOMEM;
> +			goto error;
> +		}
> +
> +		vqs[i] = vq;
> +		vring->vq = vq;
> +		vq->priv = vring;
> +	}
> +
> +	return 0;
> +
> +error:
> +	mlxbf_tmfifo_virtio_del_vqs(vdev);
> +	return ret;
> +}
> +
> +/* Read the status byte. */
> +static u8 mlxbf_tmfifo_virtio_get_status(struct virtio_device *vdev) {
> +	struct mlxbf_tmfifo_vdev *tm_vdev;
> +
> +	tm_vdev = container_of(vdev, struct mlxbf_tmfifo_vdev, vdev);
> +
> +	return tm_vdev->status;
> +}
> +
> +/* Write the status byte. */
> +static void mlxbf_tmfifo_virtio_set_status(struct virtio_device *vdev,
> +					   u8 status)
> +{
> +	struct mlxbf_tmfifo_vdev *tm_vdev;
> +
> +	tm_vdev = container_of(vdev, struct mlxbf_tmfifo_vdev, vdev);
> +	tm_vdev->status = status;
> +}
> +
> +/* Reset the device. Not much here for now. */ static void
> +mlxbf_tmfifo_virtio_reset(struct virtio_device *vdev) {
> +	struct mlxbf_tmfifo_vdev *tm_vdev;
> +
> +	tm_vdev = container_of(vdev, struct mlxbf_tmfifo_vdev, vdev);
> +	tm_vdev->status = 0;
> +}
> +
> +/* Read the value of a configuration field. */ static void
> +mlxbf_tmfifo_virtio_get(struct virtio_device *vdev,
> +			      unsigned int offset,
> +			      void *buf,
> +			      unsigned int len)
> +{
> +	struct mlxbf_tmfifo_vdev *tm_vdev;
> +
> +	tm_vdev = container_of(vdev, struct mlxbf_tmfifo_vdev, vdev);
> +

	unsigned int pos = offset + len;

	if (pos > sizeof(tm_vdev->config) || pos < len)


> +	if (offset + len > sizeof(tm_vdev->config) || offset + len < len) {
> +		dev_err(vdev->dev.parent, "virtio_get access out of bounds\n");
> +		return;
> +	}
> +
> +	memcpy(buf, (u8 *)&tm_vdev->config + offset, len); }
> +
> +/* Write the value of a configuration field. */ static void
> +mlxbf_tmfifo_virtio_set(struct virtio_device *vdev,
> +				 unsigned int offset,
> +				 const void *buf,
> +				 unsigned int len)
> +{
> +	struct mlxbf_tmfifo_vdev *tm_vdev;
> +
> +	tm_vdev = container_of(vdev, struct mlxbf_tmfifo_vdev, vdev);
> +
> +	if (offset + len > sizeof(tm_vdev->config) || offset + len < len) {

Same as above.

> +		dev_err(vdev->dev.parent, "virtio_get access out of bounds\n");
> +		return;
> +	}
> +
> +	memcpy((u8 *)&tm_vdev->config + offset, buf, len); }
> +
> +/* Virtio config operations. */
> +static const struct virtio_config_ops mlxbf_tmfifo_virtio_config_ops = {
> +	.get_features = mlxbf_tmfifo_virtio_get_features,
> +	.finalize_features = mlxbf_tmfifo_virtio_finalize_features,
> +	.find_vqs = mlxbf_tmfifo_virtio_find_vqs,
> +	.del_vqs = mlxbf_tmfifo_virtio_del_vqs,
> +	.reset = mlxbf_tmfifo_virtio_reset,
> +	.set_status = mlxbf_tmfifo_virtio_set_status,
> +	.get_status = mlxbf_tmfifo_virtio_get_status,
> +	.get = mlxbf_tmfifo_virtio_get,
> +	.set = mlxbf_tmfifo_virtio_set,
> +};
> +
> +/* Create vdev type in a tmfifo. */
> +int mlxbf_tmfifo_create_vdev(struct mlxbf_tmfifo *fifo, int vdev_id,
> +			     u64 features, void *config, u32 size) {
> +	struct mlxbf_tmfifo_vdev *tm_vdev;
> +	int ret = 0;
> +
> +	mutex_lock(&fifo->lock);
> +
> +	tm_vdev = fifo->vdev[vdev_id];
> +	if (tm_vdev != NULL) {
> +		pr_err("vdev %d already exists\n", vdev_id);
> +		ret = -EEXIST;
> +		goto already_exist;
> +	}
> +
> +	tm_vdev = kzalloc(sizeof(*tm_vdev), GFP_KERNEL);
> +	if (!tm_vdev) {
> +		ret = -ENOMEM;
> +		goto already_exist;
> +	}
> +
> +	tm_vdev->vdev.id.device = vdev_id;
> +	tm_vdev->vdev.config = &mlxbf_tmfifo_virtio_config_ops;
> +	tm_vdev->vdev.dev.parent = &fifo->pdev->dev;
> +	tm_vdev->vdev.dev.release = mlxbf_tmfifo_virtio_dev_release;
> +	tm_vdev->features = features;
> +	if (config)
> +		memcpy(&tm_vdev->config, config, size);
> +	if (mlxbf_tmfifo_alloc_vrings(fifo, tm_vdev, vdev_id)) {
> +		pr_err("Unable to allocate vring\n");
> +		ret = -ENOMEM;
> +		goto alloc_vring_fail;
> +	}
> +	if (vdev_id == VIRTIO_ID_CONSOLE) {
> +		tm_vdev->tx_buf =
> kmalloc(MLXBF_TMFIFO_CONS_TX_BUF_SIZE,
> +					  GFP_KERNEL);
> +	}
> +	fifo->vdev[vdev_id] = tm_vdev;
> +
> +	/* Register the virtio device. */
> +	ret = register_virtio_device(&tm_vdev->vdev);
> +	if (ret) {
> +		dev_err(&fifo->pdev->dev, "register_virtio_device() failed\n");
> +		goto register_fail;
> +	}
> +
> +	mutex_unlock(&fifo->lock);
> +	return 0;
> +
> +register_fail:
> +	mlxbf_tmfifo_free_vrings(fifo, vdev_id);
> +	fifo->vdev[vdev_id] = NULL;
> +alloc_vring_fail:
> +	kfree(tm_vdev);
> +already_exist:
> +	mutex_unlock(&fifo->lock);
> +	return ret;
> +}
> +
> +/* Delete vdev type from a tmfifo. */
> +int mlxbf_tmfifo_delete_vdev(struct mlxbf_tmfifo *fifo, int vdev_id) {
> +	struct mlxbf_tmfifo_vdev *tm_vdev;
> +
> +	mutex_lock(&fifo->lock);
> +
> +	/* Unregister vdev. */
> +	tm_vdev = fifo->vdev[vdev_id];
> +	if (tm_vdev) {
> +		unregister_virtio_device(&tm_vdev->vdev);
> +		mlxbf_tmfifo_free_vrings(fifo, vdev_id);
> +		kfree(tm_vdev->tx_buf);
> +		kfree(tm_vdev);
> +		fifo->vdev[vdev_id] = NULL;
> +	}
> +
> +	mutex_unlock(&fifo->lock);
> +
> +	return 0;
> +}
> +
> +/* Device remove function. */
> +static int mlxbf_tmfifo_remove(struct platform_device *pdev) {

Locate it after probe.
If you'll use all devm_, like Andy noted:
devm_ioremap
devm_ioremap_resource
devm_kzalloc
devm_request_mem_region
you can drop all kfree, release_mem_region, iounmap

And make the below as a separate routine, something like
mlxbf_tmfifo_cleanup(), if you still need it.

> +	struct mlxbf_tmfifo *fifo = platform_get_drvdata(pdev);
> +	struct resource *rx_res, *tx_res;
> +	int i;
> +
> +	if (fifo) {
> +		mutex_lock(&mlxbf_tmfifo_lock);
> +
> +		fifo->is_ready = false;
> +
> +		/* Stop the timer. */
> +		del_timer_sync(&fifo->timer);
> +
> +		/* Release interrupts. */
> +		mlxbf_tmfifo_free_irqs(fifo);
> +
> +		/* Cancel the pending work. */
> +		cancel_work_sync(&fifo->work);
> +
> +		for (i = 0; i < MLXBF_TMFIFO_VDEV_MAX; i++)
> +			mlxbf_tmfifo_delete_vdev(fifo, i);
> +
> +		/* Release IO resources. */
> +		if (fifo->rx_base)
> +			iounmap(fifo->rx_base);
> +		if (fifo->tx_base)
> +			iounmap(fifo->tx_base);
> +
> +		platform_set_drvdata(pdev, NULL);
> +		kfree(fifo);
> +
> +		mutex_unlock(&mlxbf_tmfifo_lock);
> +	}
> +
> +	rx_res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
> +	if (rx_res)
> +		release_mem_region(rx_res->start, resource_size(rx_res));
> +	tx_res = platform_get_resource(pdev, IORESOURCE_MEM, 1);
> +	if (tx_res)
> +		release_mem_region(tx_res->start, resource_size(tx_res));
> +
> +	return 0;
> +}
> +
> +/* Read the configured network MAC address from efi variable. */ static
> +void mlxbf_tmfifo_get_cfg_mac(u8 *mac) {
> +	efi_char16_t name[] = {
> +		'R', 's', 'h', 'i', 'm', 'M', 'a', 'c', 'A', 'd', 'd', 'r', 0 };


Could it be moved out and set like:
static const efi_char16_t mlxbf_tmfifo_efi_name[] = "...";
Could you check if the are some examples in kernel, please?

> +	efi_guid_t guid = EFI_GLOBAL_VARIABLE_GUID;
> +	efi_status_t status;
> +	unsigned long size;
> +	u8 buf[6];
> +
> +	size = sizeof(buf);
> +	status = efi.get_variable(name, &guid, NULL, &size, buf);
> +	if (status == EFI_SUCCESS && size == sizeof(buf))
> +		memcpy(mac, buf, sizeof(buf));
> +}
> +
> +/* Probe the TMFIFO. */
> +static int mlxbf_tmfifo_probe(struct platform_device *pdev) {
> +	struct virtio_net_config net_config;
> +	struct resource *rx_res, *tx_res;
> +	struct mlxbf_tmfifo *fifo;
> +	int i, ret;
> +	u64 ctl;
> +
> +	/* Get the resource of the Rx & Tx FIFO. */
> +	rx_res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
> +	tx_res = platform_get_resource(pdev, IORESOURCE_MEM, 1);
> +	if (!rx_res || !tx_res) {
> +		ret = -EINVAL;
> +		goto err;
> +	}
> +
> +	if (request_mem_region(rx_res->start,
> +			       resource_size(rx_res), "bf-tmfifo") == NULL) {
> +		ret = -EBUSY;
> +		goto early_err;
> +	}
> +
> +	if (request_mem_region(tx_res->start,
> +			       resource_size(tx_res), "bf-tmfifo") == NULL) {
> +		release_mem_region(rx_res->start, resource_size(rx_res));
> +		ret = -EBUSY;
> +		goto early_err;
> +	}
> +
> +	ret = -ENOMEM;
> +	fifo = kzalloc(sizeof(struct mlxbf_tmfifo), GFP_KERNEL);
> +	if (!fifo)
> +		goto err;
> +
> +	fifo->pdev = pdev;
> +	platform_set_drvdata(pdev, fifo);
> +
> +	spin_lock_init(&fifo->spin_lock);
> +	INIT_WORK(&fifo->work, mlxbf_tmfifo_work_handler);
> +
> +	timer_setup(&fifo->timer, mlxbf_tmfifo_timer, 0);
> +	fifo->timer.function = mlxbf_tmfifo_timer;
> +
> +	for (i = 0; i < MLXBF_TM_IRQ_CNT; i++) {
> +		fifo->irq_info[i].index = i;
> +		fifo->irq_info[i].fifo = fifo;
> +		fifo->irq_info[i].irq = platform_get_irq(pdev, i);
> +		ret = request_irq(fifo->irq_info[i].irq,
> +				  mlxbf_tmfifo_irq_handler, 0,
> +				  "tmfifo", &fifo->irq_info[i]);
> +		if (ret) {
> +			pr_err("Unable to request irq\n");
> +			fifo->irq_info[i].irq = 0;
> +			goto err;
> +		}
> +	}
> +
> +	fifo->rx_base = ioremap(rx_res->start, resource_size(rx_res));
> +	if (!fifo->rx_base)
> +		goto err;
> +
> +	fifo->tx_base = ioremap(tx_res->start, resource_size(tx_res));
> +	if (!fifo->tx_base)
> +		goto err;
> +
> +	/* Get Tx FIFO size and set the low/high watermark. */
> +	ctl = readq(fifo->tx_base + MLXBF_TMFIFO_TX_CTL);
> +	fifo->tx_fifo_size =
> +		FIELD_GET(MLXBF_TMFIFO_TX_CTL__MAX_ENTRIES_MASK,
> ctl);
> +	ctl = (ctl & ~MLXBF_TMFIFO_TX_CTL__LWM_MASK) |
> +		FIELD_PREP(MLXBF_TMFIFO_TX_CTL__LWM_MASK,
> +			   fifo->tx_fifo_size / 2);
> +	ctl = (ctl & ~MLXBF_TMFIFO_TX_CTL__HWM_MASK) |
> +		FIELD_PREP(MLXBF_TMFIFO_TX_CTL__HWM_MASK,
> +			   fifo->tx_fifo_size - 1);
> +	writeq(ctl, fifo->tx_base + MLXBF_TMFIFO_TX_CTL);
> +
> +	/* Get Rx FIFO size and set the low/high watermark. */
> +	ctl = readq(fifo->rx_base + MLXBF_TMFIFO_RX_CTL);
> +	fifo->rx_fifo_size =
> +		FIELD_GET(MLXBF_TMFIFO_RX_CTL__MAX_ENTRIES_MASK,
> ctl);
> +	ctl = (ctl & ~MLXBF_TMFIFO_RX_CTL__LWM_MASK) |
> +		FIELD_PREP(MLXBF_TMFIFO_RX_CTL__LWM_MASK, 0);
> +	ctl = (ctl & ~MLXBF_TMFIFO_RX_CTL__HWM_MASK) |
> +		FIELD_PREP(MLXBF_TMFIFO_RX_CTL__HWM_MASK, 1);
> +	writeq(ctl, fifo->rx_base + MLXBF_TMFIFO_RX_CTL);
> +
> +	mutex_init(&fifo->lock);
> +
> +	/* Create the console vdev. */
> +	ret = mlxbf_tmfifo_create_vdev(fifo, VIRTIO_ID_CONSOLE, 0, NULL, 0);
> +	if (ret)
> +		goto err;
> +
> +	/* Create the network vdev. */
> +	memset(&net_config, 0, sizeof(net_config));
> +	net_config.mtu = MLXBF_TMFIFO_NET_MTU;
> +	net_config.status = VIRTIO_NET_S_LINK_UP;
> +	memcpy(net_config.mac, mlxbf_tmfifo_net_default_mac, 6);
> +	mlxbf_tmfifo_get_cfg_mac(net_config.mac);
> +	ret = mlxbf_tmfifo_create_vdev(fifo, VIRTIO_ID_NET,
> +		MLXBF_TMFIFO_NET_FEATURES, &net_config,
> sizeof(net_config));
> +	if (ret)
> +		goto err;
> +
> +	mod_timer(&fifo->timer, jiffies + mlxbf_tmfifo_timer_interval);
> +
> +	fifo->is_ready = true;
> +
> +	return 0;
> +
> +err:
> +	mlxbf_tmfifo_remove(pdev);
> +early_err:
> +	dev_err(&pdev->dev, "Probe Failed\n");
> +	return ret;
> +}
> +
> +static const struct of_device_id mlxbf_tmfifo_match[] = {
> +	{ .compatible = "mellanox,bf-tmfifo" },
> +	{},
> +};
> +MODULE_DEVICE_TABLE(of, mlxbf_tmfifo_match);
> +
> +static const struct acpi_device_id mlxbf_tmfifo_acpi_match[] = {
> +	{ "MLNXBF01", 0 },
> +	{},
> +};
> +MODULE_DEVICE_TABLE(acpi, mlxbf_tmfifo_acpi_match);
> +
> +static struct platform_driver mlxbf_tmfifo_driver = {
> +	.probe = mlxbf_tmfifo_probe,
> +	.remove = mlxbf_tmfifo_remove,
> +	.driver = {
> +		.name = "bf-tmfifo",
> +		.of_match_table = mlxbf_tmfifo_match,
> +		.acpi_match_table = ACPI_PTR(mlxbf_tmfifo_acpi_match),
> +	},
> +};
> +
> +module_platform_driver(mlxbf_tmfifo_driver);
> +
> +MODULE_DESCRIPTION("Mellanox BlueField SoC TmFifo Driver");
> +MODULE_LICENSE("GPL"); MODULE_AUTHOR("Mellanox Technologies");
> --
> 1.8.3.1


^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH v9] platform/mellanox: Add TmFifo driver for Mellanox BlueField Soc
       [not found] <b143b40446c1870fb8d422b364ead95d54552be9.1527264077.git.lsun@mellanox.com>
                   ` (2 preceding siblings ...)
  2019-01-28 17:28 ` [PATCH v8 2/2] dt-bindings: soc: Add TmFifo binding for Mellanox BlueField SoC Liming Sun
@ 2019-02-13 13:27 ` Liming Sun
  2019-02-13 18:11   ` Andy Shevchenko
  2019-02-28 15:51 ` [PATCH v10] " Liming Sun
                   ` (7 subsequent siblings)
  11 siblings, 1 reply; 30+ messages in thread
From: Liming Sun @ 2019-02-13 13:27 UTC (permalink / raw)
  To: David Woods, Andy Shevchenko, Darren Hart, Vadim Pasternak
  Cc: Liming Sun, linux-kernel, platform-driver-x86

This commit adds the TmFifo platform driver for Mellanox BlueField
Soc. TmFifo is a shared FIFO which enables external host machine
to exchange data with the SoC via USB or PCIe. The driver is based
on virtio framework and has console and network access enabled.

Signed-off-by: Liming Sun <lsun@mellanox.com>

---

v9: Fix coding styles. Adjust code to use devm_xxx() APIs.
    Removed the DT binding documentation since only ACPI is
    supported for now by UEFI on the SoC.
v8: Re-submit under drivers/platform/mellanox with target-size
    platform driver only.
v7: Added host side drivers into the same patch set.
v5~v6: Coding style fix.
v1~v4: Initial version for directory drivers/soc/mellanox.
---
 drivers/platform/mellanox/Kconfig             |   10 +-
 drivers/platform/mellanox/Makefile            |    1 +
 drivers/platform/mellanox/mlxbf-tmfifo-regs.h |   67 ++
 drivers/platform/mellanox/mlxbf-tmfifo.c      | 1361 +++++++++++++++++++++++++
 4 files changed, 1438 insertions(+), 1 deletion(-)
 create mode 100644 drivers/platform/mellanox/mlxbf-tmfifo-regs.h
 create mode 100644 drivers/platform/mellanox/mlxbf-tmfifo.c

diff --git a/drivers/platform/mellanox/Kconfig b/drivers/platform/mellanox/Kconfig
index cd8a908..6feceb1 100644
--- a/drivers/platform/mellanox/Kconfig
+++ b/drivers/platform/mellanox/Kconfig
@@ -5,7 +5,7 @@
 
 menuconfig MELLANOX_PLATFORM
 	bool "Platform support for Mellanox hardware"
-	depends on X86 || ARM || COMPILE_TEST
+	depends on X86 || ARM || ARM64 || COMPILE_TEST
 	---help---
 	  Say Y here to get to see options for platform support for
 	  Mellanox systems. This option alone does not add any kernel code.
@@ -34,4 +34,12 @@ config MLXREG_IO
 	  to system resets operation, system reset causes monitoring and some
 	  kinds of mux selection.
 
+config MLXBF_TMFIFO
+	tristate "Mellanox BlueField SoC TmFifo platform driver"
+	depends on ARM64 && ACPI && VIRTIO_CONSOLE && VIRTIO_NET
+	help
+	  Say y here to enable TmFifo support. The TmFifo driver provides
+          platform driver support for the TmFifo which supports console
+          and networking based on the virtio framework.
+
 endif # MELLANOX_PLATFORM
diff --git a/drivers/platform/mellanox/Makefile b/drivers/platform/mellanox/Makefile
index 57074d9c..f0c061d 100644
--- a/drivers/platform/mellanox/Makefile
+++ b/drivers/platform/mellanox/Makefile
@@ -5,3 +5,4 @@
 #
 obj-$(CONFIG_MLXREG_HOTPLUG)	+= mlxreg-hotplug.o
 obj-$(CONFIG_MLXREG_IO) += mlxreg-io.o
+obj-$(CONFIG_MLXBF_TMFIFO)	+= mlxbf-tmfifo.o
diff --git a/drivers/platform/mellanox/mlxbf-tmfifo-regs.h b/drivers/platform/mellanox/mlxbf-tmfifo-regs.h
new file mode 100644
index 0000000..90c9c2cf
--- /dev/null
+++ b/drivers/platform/mellanox/mlxbf-tmfifo-regs.h
@@ -0,0 +1,67 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Copyright (c) 2019, Mellanox Technologies. All rights reserved.
+ */
+
+#ifndef __MLXBF_TMFIFO_REGS_H__
+#define __MLXBF_TMFIFO_REGS_H__
+
+#include <linux/types.h>
+
+#define MLXBF_TMFIFO_TX_DATA 0x0
+
+#define MLXBF_TMFIFO_TX_STS 0x8
+#define MLXBF_TMFIFO_TX_STS__LENGTH 0x0001
+#define MLXBF_TMFIFO_TX_STS__COUNT_SHIFT 0
+#define MLXBF_TMFIFO_TX_STS__COUNT_WIDTH 9
+#define MLXBF_TMFIFO_TX_STS__COUNT_RESET_VAL 0
+#define MLXBF_TMFIFO_TX_STS__COUNT_RMASK 0x1ff
+#define MLXBF_TMFIFO_TX_STS__COUNT_MASK  0x1ff
+
+#define MLXBF_TMFIFO_TX_CTL 0x10
+#define MLXBF_TMFIFO_TX_CTL__LENGTH 0x0001
+#define MLXBF_TMFIFO_TX_CTL__LWM_SHIFT 0
+#define MLXBF_TMFIFO_TX_CTL__LWM_WIDTH 8
+#define MLXBF_TMFIFO_TX_CTL__LWM_RESET_VAL 128
+#define MLXBF_TMFIFO_TX_CTL__LWM_RMASK 0xff
+#define MLXBF_TMFIFO_TX_CTL__LWM_MASK  0xff
+#define MLXBF_TMFIFO_TX_CTL__HWM_SHIFT 8
+#define MLXBF_TMFIFO_TX_CTL__HWM_WIDTH 8
+#define MLXBF_TMFIFO_TX_CTL__HWM_RESET_VAL 128
+#define MLXBF_TMFIFO_TX_CTL__HWM_RMASK 0xff
+#define MLXBF_TMFIFO_TX_CTL__HWM_MASK  0xff00
+#define MLXBF_TMFIFO_TX_CTL__MAX_ENTRIES_SHIFT 32
+#define MLXBF_TMFIFO_TX_CTL__MAX_ENTRIES_WIDTH 9
+#define MLXBF_TMFIFO_TX_CTL__MAX_ENTRIES_RESET_VAL 256
+#define MLXBF_TMFIFO_TX_CTL__MAX_ENTRIES_RMASK 0x1ff
+#define MLXBF_TMFIFO_TX_CTL__MAX_ENTRIES_MASK  0x1ff00000000ULL
+
+#define MLXBF_TMFIFO_RX_DATA 0x0
+
+#define MLXBF_TMFIFO_RX_STS 0x8
+#define MLXBF_TMFIFO_RX_STS__LENGTH 0x0001
+#define MLXBF_TMFIFO_RX_STS__COUNT_SHIFT 0
+#define MLXBF_TMFIFO_RX_STS__COUNT_WIDTH 9
+#define MLXBF_TMFIFO_RX_STS__COUNT_RESET_VAL 0
+#define MLXBF_TMFIFO_RX_STS__COUNT_RMASK 0x1ff
+#define MLXBF_TMFIFO_RX_STS__COUNT_MASK  0x1ff
+
+#define MLXBF_TMFIFO_RX_CTL 0x10
+#define MLXBF_TMFIFO_RX_CTL__LENGTH 0x0001
+#define MLXBF_TMFIFO_RX_CTL__LWM_SHIFT 0
+#define MLXBF_TMFIFO_RX_CTL__LWM_WIDTH 8
+#define MLXBF_TMFIFO_RX_CTL__LWM_RESET_VAL 128
+#define MLXBF_TMFIFO_RX_CTL__LWM_RMASK 0xff
+#define MLXBF_TMFIFO_RX_CTL__LWM_MASK  0xff
+#define MLXBF_TMFIFO_RX_CTL__HWM_SHIFT 8
+#define MLXBF_TMFIFO_RX_CTL__HWM_WIDTH 8
+#define MLXBF_TMFIFO_RX_CTL__HWM_RESET_VAL 128
+#define MLXBF_TMFIFO_RX_CTL__HWM_RMASK 0xff
+#define MLXBF_TMFIFO_RX_CTL__HWM_MASK  0xff00
+#define MLXBF_TMFIFO_RX_CTL__MAX_ENTRIES_SHIFT 32
+#define MLXBF_TMFIFO_RX_CTL__MAX_ENTRIES_WIDTH 9
+#define MLXBF_TMFIFO_RX_CTL__MAX_ENTRIES_RESET_VAL 256
+#define MLXBF_TMFIFO_RX_CTL__MAX_ENTRIES_RMASK 0x1ff
+#define MLXBF_TMFIFO_RX_CTL__MAX_ENTRIES_MASK  0x1ff00000000ULL
+
+#endif /* !defined(__MLXBF_TMFIFO_REGS_H__) */
diff --git a/drivers/platform/mellanox/mlxbf-tmfifo.c b/drivers/platform/mellanox/mlxbf-tmfifo.c
new file mode 100644
index 0000000..ce55fca
--- /dev/null
+++ b/drivers/platform/mellanox/mlxbf-tmfifo.c
@@ -0,0 +1,1361 @@
+// SPDX-License-Identifier: GPL-2.0+
+/*
+ * Mellanox BlueField SoC TmFifo driver
+ *
+ * Copyright (C) 2019 Mellanox Technologies
+ */
+
+#include <linux/acpi.h>
+#include <linux/byteorder/generic.h>
+#include <linux/bitfield.h>
+#include <linux/cache.h>
+#include <linux/device.h>
+#include <linux/dma-mapping.h>
+#include <linux/efi.h>
+#include <linux/io.h>
+#include <linux/interrupt.h>
+#include <linux/irq.h>
+#include <linux/kernel.h>
+#include <linux/math64.h>
+#include <linux/module.h>
+#include <linux/moduleparam.h>
+#include <linux/mutex.h>
+#include <linux/platform_device.h>
+#include <linux/resource.h>
+#include <linux/slab.h>
+#include <linux/types.h>
+#include <linux/version.h>
+#include <linux/virtio.h>
+#include <linux/virtio_config.h>
+#include <linux/virtio_console.h>
+#include <linux/virtio_ids.h>
+#include <linux/virtio_net.h>
+#include <linux/virtio_ring.h>
+
+#include "mlxbf-tmfifo-regs.h"
+
+/* Vring size. */
+#define MLXBF_TMFIFO_VRING_SIZE			1024
+
+/* Console Tx buffer size. */
+#define MLXBF_TMFIFO_CONS_TX_BUF_SIZE		(32 * 1024)
+
+/* Console Tx buffer size with some reservation. */
+#define MLXBF_TMFIFO_CONS_TX_BUF_RSV_SIZE	\
+	(MLXBF_TMFIFO_CONS_TX_BUF_SIZE - 8)
+
+/* House-keeping timer interval. */
+static int mlxbf_tmfifo_timer_interval = HZ / 10;
+
+/* Global lock. */
+static DEFINE_MUTEX(mlxbf_tmfifo_lock);
+
+/* Virtual devices sharing the TM FIFO. */
+#define MLXBF_TMFIFO_VDEV_MAX		(VIRTIO_ID_CONSOLE + 1)
+
+/*
+ * Reserve 1/16 of TmFifo space, so console messages are not starved by
+ * the networking traffic.
+ */
+#define MLXBF_TMFIFO_RESERVE_RATIO		16
+
+/* Message with data needs at least two words (for header & data). */
+#define MLXBF_TMFIFO_DATA_MIN_WORDS		2
+
+/* Struct declaration. */
+struct mlxbf_tmfifo;
+
+/* Structure to maintain the ring state. */
+struct mlxbf_tmfifo_vring {
+	void *va;			/* virtual address */
+	dma_addr_t dma;			/* dma address */
+	struct virtqueue *vq;		/* virtqueue pointer */
+	struct vring_desc *desc;	/* current desc */
+	struct vring_desc *desc_head;	/* current desc head */
+	int cur_len;			/* processed len in current desc */
+	int rem_len;			/* remaining length to be processed */
+	int size;			/* vring size */
+	int align;			/* vring alignment */
+	int id;				/* vring id */
+	int vdev_id;			/* TMFIFO_VDEV_xxx */
+	u32 pkt_len;			/* packet total length */
+	u16 next_avail;			/* next avail desc id */
+	struct mlxbf_tmfifo *fifo;	/* pointer back to the tmfifo */
+};
+
+/* Interrupt types. */
+enum {
+	MLXBF_TM_RX_LWM_IRQ,		/* Rx low water mark irq */
+	MLXBF_TM_RX_HWM_IRQ,		/* Rx high water mark irq */
+	MLXBF_TM_TX_LWM_IRQ,		/* Tx low water mark irq */
+	MLXBF_TM_TX_HWM_IRQ,		/* Tx high water mark irq */
+	MLXBF_TM_IRQ_CNT
+};
+
+/* Ring types (Rx & Tx). */
+enum {
+	MLXBF_TMFIFO_VRING_RX,		/* Rx ring */
+	MLXBF_TMFIFO_VRING_TX,		/* Tx ring */
+	MLXBF_TMFIFO_VRING_NUM
+};
+
+/* Structure for the virtual device. */
+struct mlxbf_tmfifo_vdev {
+	struct virtio_device vdev;	/* virtual device */
+	u8 status;
+	u64 features;
+	union {				/* virtio config space */
+		struct virtio_console_config cons;
+		struct virtio_net_config net;
+	} config;
+	struct mlxbf_tmfifo_vring vrings[MLXBF_TMFIFO_VRING_NUM];
+	u8 *tx_buf;			/* tx buffer */
+	u32 tx_head;			/* tx buffer head */
+	u32 tx_tail;			/* tx buffer tail */
+};
+
+/* Structure of the interrupt information. */
+struct mlxbf_tmfifo_irq_info {
+	struct mlxbf_tmfifo *fifo;	/* tmfifo structure */
+	int irq;			/* interrupt number */
+	int index;			/* array index */
+};
+
+/* Structure of the TmFifo information. */
+struct mlxbf_tmfifo {
+	struct mlxbf_tmfifo_vdev *vdev[MLXBF_TMFIFO_VDEV_MAX]; /* devices */
+	struct platform_device *pdev;	/* platform device */
+	struct mutex lock;		/* fifo lock */
+	void __iomem *rx_base;		/* mapped register base */
+	void __iomem *tx_base;		/* mapped register base */
+	int tx_fifo_size;		/* number of entries of the Tx FIFO */
+	int rx_fifo_size;		/* number of entries of the Rx FIFO */
+	unsigned long pend_events;	/* pending bits for deferred process */
+	struct mlxbf_tmfifo_irq_info irq_info[MLXBF_TM_IRQ_CNT]; /* irq info */
+	struct work_struct work;	/* work struct for deferred process */
+	struct timer_list timer;	/* keepalive timer */
+	struct mlxbf_tmfifo_vring *vring[2];	/* current Tx/Rx ring */
+	bool is_ready;			/* ready flag */
+	spinlock_t spin_lock;		/* spin lock */
+};
+
+/* Use a union struction for 64-bit little/big endian. */
+union mlxbf_tmfifo_data_64bit {
+	u64 data;
+	__le64 data_le;
+};
+
+/* Message header used to demux data in the TmFifo. */
+union mlxbf_tmfifo_msg_hdr {
+	struct {
+		u8 type;		/* message type */
+		__be16 len;		/* payload length */
+		u8 unused[5];		/* reserved, set to 0 */
+	} __packed;
+	union mlxbf_tmfifo_data_64bit u;	/* 64-bit data */
+};
+
+/*
+ * Default MAC.
+ * This MAC address will be read from EFI persistent variable if configured.
+ * It can also be reconfigured with standard Linux tools.
+ */
+static u8 mlxbf_tmfifo_net_default_mac[6] = {
+	0x00, 0x1A, 0xCA, 0xFF, 0xFF, 0x01};
+
+/* EFI variable name of the MAC address. */
+static efi_char16_t mlxbf_tmfifo_efi_name[] = L"RshimMacAddr";
+
+/* MTU setting of the virtio-net interface. */
+#define MLXBF_TMFIFO_NET_MTU		1500
+
+/* Maximum L2 header length. */
+#define MLXBF_TMFIFO_NET_L2_OVERHEAD	36
+
+/* Supported virtio-net features. */
+#define MLXBF_TMFIFO_NET_FEATURES	((1UL << VIRTIO_NET_F_MTU) | \
+					 (1UL << VIRTIO_NET_F_STATUS) | \
+					 (1UL << VIRTIO_NET_F_MAC))
+
+/* Function declarations. */
+static int mlxbf_tmfifo_remove(struct platform_device *pdev);
+
+/* Console output are buffered and can be accessed with the functions below. */
+
+/* Return the consumed Tx buffer space. */
+static int mlxbf_tmfifo_vdev_tx_buf_len(struct mlxbf_tmfifo_vdev *vdev)
+{
+	return ((vdev->tx_tail >= vdev->tx_head) ?
+		(vdev->tx_tail - vdev->tx_head) :
+		(MLXBF_TMFIFO_CONS_TX_BUF_SIZE - vdev->tx_head +
+		 vdev->tx_tail));
+}
+
+/* Return the available Tx buffer space. */
+static int mlxbf_tmfifo_vdev_tx_buf_avail(struct mlxbf_tmfifo_vdev *vdev)
+{
+	return (MLXBF_TMFIFO_CONS_TX_BUF_RSV_SIZE -
+		mlxbf_tmfifo_vdev_tx_buf_len(vdev));
+}
+
+/* Update Rx/Tx buffer index pointer. */
+static void mlxbf_tmfifo_vdev_tx_buf_index_inc(u32 *index, u32 len)
+{
+	*index += len;
+	if (*index >= MLXBF_TMFIFO_CONS_TX_BUF_SIZE)
+		*index -= MLXBF_TMFIFO_CONS_TX_BUF_SIZE;
+}
+
+/* Allocate vrings for the fifo. */
+static int mlxbf_tmfifo_alloc_vrings(struct mlxbf_tmfifo *fifo,
+				     struct mlxbf_tmfifo_vdev *tm_vdev,
+				     int vdev_id)
+{
+	struct mlxbf_tmfifo_vring *vring;
+	dma_addr_t dma;
+	int i, size;
+	void *va;
+
+	for (i = 0; i < ARRAY_SIZE(tm_vdev->vrings); i++) {
+		vring = &tm_vdev->vrings[i];
+		vring->fifo = fifo;
+		vring->size = MLXBF_TMFIFO_VRING_SIZE;
+		vring->align = SMP_CACHE_BYTES;
+		vring->id = i;
+		vring->vdev_id = vdev_id;
+
+		size = PAGE_ALIGN(vring_size(vring->size, vring->align));
+		va = dma_alloc_coherent(tm_vdev->vdev.dev.parent, size, &dma,
+					GFP_KERNEL);
+		if (!va) {
+			dev_err(tm_vdev->vdev.dev.parent,
+				"dma_alloc_coherent failed\n");
+			return -ENOMEM;
+		}
+
+		vring->va = va;
+		vring->dma = dma;
+	}
+
+	return 0;
+}
+
+/* Free vrings of the fifo device. */
+static void mlxbf_tmfifo_free_vrings(struct mlxbf_tmfifo *fifo, int vdev_id)
+{
+	struct mlxbf_tmfifo_vdev *tm_vdev = fifo->vdev[vdev_id];
+	struct mlxbf_tmfifo_vring *vring;
+	int i, size;
+
+	for (i = 0; i < ARRAY_SIZE(tm_vdev->vrings); i++) {
+		vring = &tm_vdev->vrings[i];
+		if (vring->va) {
+			size = PAGE_ALIGN(vring_size(vring->size,
+						     vring->align));
+			dma_free_coherent(tm_vdev->vdev.dev.parent, size,
+					  vring->va, vring->dma);
+			vring->va = NULL;
+			if (vring->vq) {
+				vring_del_virtqueue(vring->vq);
+				vring->vq = NULL;
+			}
+		}
+	}
+}
+
+/* Disable interrupts of the fifo device. */
+static void mlxbf_tmfifo_disable_irqs(struct mlxbf_tmfifo *fifo)
+{
+	int i, irq;
+
+	for (i = 0; i < MLXBF_TM_IRQ_CNT; i++) {
+		irq = fifo->irq_info[i].irq;
+		if (irq) {
+			fifo->irq_info[i].irq = 0;
+			disable_irq(irq);
+		}
+	}
+}
+
+/* Interrupt handler. */
+static irqreturn_t mlxbf_tmfifo_irq_handler(int irq, void *arg)
+{
+	struct mlxbf_tmfifo_irq_info *irq_info;
+
+	irq_info = (struct mlxbf_tmfifo_irq_info *)arg;
+
+	if (irq_info->index < MLXBF_TM_IRQ_CNT &&
+	    !test_and_set_bit(irq_info->index, &irq_info->fifo->pend_events))
+		schedule_work(&irq_info->fifo->work);
+
+	return IRQ_HANDLED;
+}
+
+/* Get the next packet descriptor from the vring. */
+static struct vring_desc *mlxbf_tmfifo_get_next_desc(struct virtqueue *vq)
+{
+	struct mlxbf_tmfifo_vring *vring;
+	unsigned int idx, head;
+	struct vring *vr;
+
+	vr = (struct vring *)virtqueue_get_vring(vq);
+	if (!vr)
+		return NULL;
+	vring = (struct mlxbf_tmfifo_vring *)vq->priv;
+	if (vring->next_avail == virtio16_to_cpu(vq->vdev, vr->avail->idx))
+		return NULL;
+	idx = vring->next_avail % vr->num;
+	head = virtio16_to_cpu(vq->vdev, vr->avail->ring[idx]);
+	if (WARN_ON(head >= vr->num))
+		return NULL;
+	vring->next_avail++;
+
+	return &vr->desc[head];
+}
+
+/* Release virtio descriptor. */
+static void mlxbf_tmfifo_release_desc(struct virtio_device *vdev,
+				      struct vring *vr, struct vring_desc *desc,
+				      u32 len)
+{
+	u16 idx, vr_idx;
+
+	vr_idx = virtio16_to_cpu(vdev, vr->used->idx);
+	idx = vr_idx % vr->num;
+	vr->used->ring[idx].id = cpu_to_virtio32(vdev, desc - vr->desc);
+	vr->used->ring[idx].len = cpu_to_virtio32(vdev, len);
+
+	/* Virtio could poll and check the 'idx' to decide
+	 * whether the desc is done or not. Add a memory
+	 * barrier here to make sure the update above completes
+	 * before updating the idx.
+	 */
+	mb();
+	vr->used->idx = cpu_to_virtio16(vdev, vr_idx + 1);
+}
+
+/* Get the total length of the descriptor chain. */
+static u32 mlxbf_tmfifo_get_pkt_len(struct virtio_device *vdev,
+				    struct vring_desc *desc, struct vring *vr)
+{
+	u32 len = 0, idx;
+
+	while (desc) {
+		len += virtio32_to_cpu(vdev, desc->len);
+		if (!(virtio16_to_cpu(vdev, desc->flags) & VRING_DESC_F_NEXT))
+			break;
+		idx = virtio16_to_cpu(vdev, desc->next);
+		desc = &vr->desc[idx];
+	}
+
+	return len;
+}
+
+static void mlxbf_tmfifo_release_pkt(struct virtio_device *vdev,
+				     struct mlxbf_tmfifo_vring *vring,
+				     struct vring_desc **desc)
+{
+	struct vring_desc *desc_head;
+	struct vring *vr;
+	u32 len = 0;
+
+	vr = (struct vring *)virtqueue_get_vring(vring->vq);
+	if (!vr)
+		return;
+
+	if (desc && *desc && vring->desc_head) {
+		desc_head = vring->desc_head;
+		len = vring->pkt_len;
+	} else {
+		desc_head = mlxbf_tmfifo_get_next_desc(vring->vq);
+		if (desc_head)
+			len = mlxbf_tmfifo_get_pkt_len(vdev, desc_head, vr);
+	}
+
+	if (desc_head)
+		mlxbf_tmfifo_release_desc(vdev, vr, desc_head, len);
+
+	if (desc)
+		*desc = NULL;
+	vring->pkt_len = 0;
+}
+
+/* Get and initialize the next packet. */
+static struct vring_desc *
+mlxbf_tmfifo_get_next_pkt(struct virtio_device *vdev,
+			  struct mlxbf_tmfifo_vring *vring, bool is_rx)
+{
+	struct vring_desc *desc;
+
+	desc = mlxbf_tmfifo_get_next_desc(vring->vq);
+
+	/* Initialize the packet header for received network packet. */
+	if (desc && is_rx && vring->vdev_id == VIRTIO_ID_NET) {
+		struct virtio_net_hdr *net_hdr;
+
+		net_hdr = (struct virtio_net_hdr *)
+			phys_to_virt(virtio64_to_cpu(vdev, desc->addr));
+		memset(net_hdr, 0, sizeof(*net_hdr));
+	}
+
+	vring->desc_head = desc;
+	vring->desc = desc;
+
+	return desc;
+}
+
+/* House-keeping timer. */
+static void mlxbf_tmfifo_timer(struct timer_list *arg)
+{
+	struct mlxbf_tmfifo *fifo;
+
+	fifo = container_of(arg, struct mlxbf_tmfifo, timer);
+
+	/*
+	 * Wake up the work handler to poll the Rx FIFO in case interrupt
+	 * missing or any leftover bytes stuck in the FIFO.
+	 */
+	test_and_set_bit(MLXBF_TM_RX_HWM_IRQ, &fifo->pend_events);
+
+	/*
+	 * Wake up Tx handler in case virtio has queued too many packets
+	 * and are waiting for buffer return.
+	 */
+	test_and_set_bit(MLXBF_TM_TX_LWM_IRQ, &fifo->pend_events);
+
+	schedule_work(&fifo->work);
+
+	mod_timer(&fifo->timer, jiffies + mlxbf_tmfifo_timer_interval);
+}
+
+/* Copy one console packet into the output buffer. */
+static void mlxbf_tmfifo_console_output_one(struct mlxbf_tmfifo_vdev *cons,
+					    struct virtio_device *vdev,
+					    struct vring *vr,
+					    struct vring_desc *desc)
+{
+	u32 len, idx, seg;
+	void *addr;
+
+	while (desc) {
+		addr = phys_to_virt(virtio64_to_cpu(vdev, desc->addr));
+		len = virtio32_to_cpu(vdev, desc->len);
+
+		if (len <= MLXBF_TMFIFO_CONS_TX_BUF_SIZE - cons->tx_tail) {
+			memcpy(cons->tx_buf + cons->tx_tail, addr, len);
+		} else {
+			seg = MLXBF_TMFIFO_CONS_TX_BUF_SIZE - cons->tx_tail;
+			memcpy(cons->tx_buf + cons->tx_tail, addr, seg);
+			addr += seg;
+			memcpy(cons->tx_buf, addr, len - seg);
+		}
+		mlxbf_tmfifo_vdev_tx_buf_index_inc(&cons->tx_tail, len);
+
+		if (!(virtio16_to_cpu(vdev, desc->flags) & VRING_DESC_F_NEXT))
+			break;
+		idx = virtio16_to_cpu(vdev, desc->next);
+		desc = &vr->desc[idx];
+	}
+}
+
+/* Copy console data into the output buffer. */
+static void mlxbf_tmfifo_console_output(struct mlxbf_tmfifo_vdev *cons,
+					struct virtqueue *vq)
+{
+	struct vring *vr = (struct vring *)virtqueue_get_vring(vq);
+	struct virtio_device *vdev = &cons->vdev;
+	struct vring_desc *desc;
+	u32 len;
+
+	desc = mlxbf_tmfifo_get_next_desc(vq);
+	while (desc) {
+		/* Release the packet if not enough space. */
+		len = mlxbf_tmfifo_get_pkt_len(vdev, desc, vr);
+		if (len > mlxbf_tmfifo_vdev_tx_buf_avail(cons)) {
+			mlxbf_tmfifo_release_desc(vdev, vr, desc, len);
+			break;
+		}
+
+		/* Output this packet. */
+		mlxbf_tmfifo_console_output_one(cons, vdev, vr, desc);
+
+		/* Release the head desc. */
+		mlxbf_tmfifo_release_desc(vdev, vr, desc, len);
+
+		/* Get next packet. */
+		desc = mlxbf_tmfifo_get_next_desc(vq);
+	}
+}
+
+/* Get the number of available words in Rx FIFO for receiving. */
+static int mlxbf_tmfifo_get_rx_avail(struct mlxbf_tmfifo *fifo)
+{
+	u64 sts;
+
+	sts = readq(fifo->rx_base + MLXBF_TMFIFO_RX_STS);
+	return FIELD_GET(MLXBF_TMFIFO_RX_STS__COUNT_MASK, sts);
+}
+
+/* Get the number of available words in the TmFifo for sending. */
+static int mlxbf_tmfifo_get_tx_avail(struct mlxbf_tmfifo *fifo,
+				     struct mlxbf_tmfifo_vring *vring)
+{
+	int tx_reserve;
+	u64 sts;
+
+	/* Reserve some room in FIFO for console messages. */
+	if (vring->vdev_id == VIRTIO_ID_NET)
+		tx_reserve = fifo->tx_fifo_size / MLXBF_TMFIFO_RESERVE_RATIO;
+	else
+		tx_reserve = 1;
+
+	sts = readq(fifo->tx_base + MLXBF_TMFIFO_TX_STS);
+	return (fifo->tx_fifo_size - tx_reserve -
+		FIELD_GET(MLXBF_TMFIFO_TX_STS__COUNT_MASK, sts));
+}
+
+/* Console Tx (move data from the output buffer into the TmFifo). */
+static void mlxbf_tmfifo_console_tx(struct mlxbf_tmfifo *fifo, int avail)
+{
+	union mlxbf_tmfifo_msg_hdr hdr;
+	struct mlxbf_tmfifo_vdev *cons;
+	unsigned long flags;
+	int size, partial;
+	void *addr;
+	u64 data;
+
+	/* Return if not enough space available. */
+	if (avail < MLXBF_TMFIFO_DATA_MIN_WORDS)
+		return;
+
+	cons = fifo->vdev[VIRTIO_ID_CONSOLE];
+	if (!cons || !cons->tx_buf)
+		return;
+
+	/* Return if no data to send. */
+	size = mlxbf_tmfifo_vdev_tx_buf_len(cons);
+	if (size == 0)
+		return;
+
+	/* Adjust the size to available space. */
+	if (size + sizeof(hdr) > avail * sizeof(u64))
+		size = avail * sizeof(u64) - sizeof(hdr);
+
+	/* Write header. */
+	hdr.u.data = 0;
+	hdr.type = VIRTIO_ID_CONSOLE;
+	hdr.len = htons(size);
+	hdr.u.data_le = cpu_to_le64(hdr.u.data);
+	writeq(hdr.u.data, fifo->tx_base + MLXBF_TMFIFO_TX_DATA);
+
+	spin_lock_irqsave(&fifo->spin_lock, flags);
+
+	while (size > 0) {
+		addr = cons->tx_buf + cons->tx_head;
+
+		if (cons->tx_head + sizeof(u64) <=
+		    MLXBF_TMFIFO_CONS_TX_BUF_SIZE) {
+			memcpy(&data, addr, sizeof(u64));
+		} else {
+			partial = MLXBF_TMFIFO_CONS_TX_BUF_SIZE - cons->tx_head;
+			memcpy(&data, addr, partial);
+			memcpy((u8 *)&data + partial, cons->tx_buf,
+			       sizeof(u64) - partial);
+		}
+		writeq(data, fifo->tx_base + MLXBF_TMFIFO_TX_DATA);
+
+		if (size >= sizeof(u64)) {
+			mlxbf_tmfifo_vdev_tx_buf_index_inc(&cons->tx_head,
+							   sizeof(u64));
+			size -= sizeof(u64);
+		} else {
+			mlxbf_tmfifo_vdev_tx_buf_index_inc(&cons->tx_head,
+							   size);
+			size = 0;
+		}
+	}
+
+	spin_unlock_irqrestore(&fifo->spin_lock, flags);
+}
+
+/* Rx/Tx one word in the descriptor buffer. */
+static void mlxbf_tmfifo_rxtx_word(struct mlxbf_tmfifo *fifo,
+				   struct virtio_device *vdev,
+				   struct mlxbf_tmfifo_vring *vring,
+				   struct vring_desc *desc,
+				   bool is_rx, int *avail, int len)
+{
+	union mlxbf_tmfifo_data_64bit u;
+	void *addr;
+
+	/* Get the buffer address of this desc. */
+	addr = phys_to_virt(virtio64_to_cpu(vdev, desc->addr));
+
+	/* Read a word from FIFO for Rx. */
+	if (is_rx) {
+		u.data = readq(fifo->rx_base + MLXBF_TMFIFO_RX_DATA);
+		u.data = le64_to_cpu(u.data_le);
+	}
+
+	if (vring->cur_len + sizeof(u64) <= len) {
+		/* The whole word. */
+		if (is_rx)
+			memcpy(addr + vring->cur_len, &u.data, sizeof(u64));
+		else
+			memcpy(&u.data, addr + vring->cur_len, sizeof(u64));
+		vring->cur_len += sizeof(u64);
+	} else {
+		/* Leftover bytes. */
+		if (WARN_ON(vring->cur_len > len))
+			return;
+		if (is_rx)
+			memcpy(addr + vring->cur_len, &u.data,
+			       len - vring->cur_len);
+		else
+			memcpy(&u.data, addr + vring->cur_len,
+			       len - vring->cur_len);
+		vring->cur_len = len;
+	}
+
+	/* Write the word into FIFO for Tx. */
+	if (!is_rx) {
+		u.data_le = cpu_to_le64(u.data);
+		writeq(u.data, fifo->tx_base + MLXBF_TMFIFO_TX_DATA);
+	}
+
+	(*avail)--;
+}
+
+/*
+ * Rx/Tx packet header.
+ *
+ * In Rx case, the packet might be found to belong to a different vring since
+ * the TmFifo is shared by different services. In such case, the 'vring_change'
+ * flag is set.
+ */
+static void mlxbf_tmfifo_rxtx_header(struct mlxbf_tmfifo *fifo,
+				     struct virtio_device *vdev,
+				     struct mlxbf_tmfifo_vring *vring,
+				     struct vring *vr,
+				     struct vring_desc *desc,
+				     bool is_rx, int *avail,
+				     int *vring_change)
+{
+	struct virtio_net_config *config;
+	union mlxbf_tmfifo_msg_hdr hdr;
+	int vdev_id;
+	int hdr_len;
+
+	/* Update the available data in the FIFO for the header. */
+	(*avail)--;
+
+	/* Read/Write packet header. */
+	if (is_rx) {
+		/* Drain one word from the FIFO. */
+		hdr.u.data = readq(fifo->rx_base + MLXBF_TMFIFO_RX_DATA);
+		hdr.u.data = le64_to_cpu(hdr.u.data_le);
+
+		/* Skip the length 0 packets (keepalive). */
+		if (hdr.len == 0)
+			return;
+
+		/* Check packet type. */
+		if (hdr.type == VIRTIO_ID_NET) {
+			vdev_id = VIRTIO_ID_NET;
+			hdr_len = sizeof(struct virtio_net_hdr);
+			config = &fifo->vdev[vdev_id]->config.net;
+			if (ntohs(hdr.len) > config->mtu +
+			    MLXBF_TMFIFO_NET_L2_OVERHEAD)
+				return;
+		} else {
+			vdev_id = VIRTIO_ID_CONSOLE;
+			hdr_len = 0;
+		}
+
+		/*
+		 * Check whether the new packet still belongs to this vring.
+		 * If not, update the pkt_len of the new vring.
+		 */
+		if (vdev_id != vring->vdev_id) {
+			struct mlxbf_tmfifo_vdev *dev2 = fifo->vdev[vdev_id];
+
+			if (!dev2)
+				return;
+			vring->desc = desc;
+			vring = &dev2->vrings[MLXBF_TMFIFO_VRING_RX];
+			*vring_change = 1;
+		}
+		vring->pkt_len = ntohs(hdr.len) + hdr_len;
+	} else {
+		/* Network virtio has an extra header. */
+		hdr_len = (vring->vdev_id == VIRTIO_ID_NET) ?
+			   sizeof(struct virtio_net_hdr) : 0;
+		vring->pkt_len = mlxbf_tmfifo_get_pkt_len(vdev, desc, vr);
+		hdr.u.data = 0;
+		hdr.type = (vring->vdev_id == VIRTIO_ID_NET) ?
+			    VIRTIO_ID_NET : VIRTIO_ID_CONSOLE;
+		hdr.len = htons(vring->pkt_len - hdr_len);
+		hdr.u.data_le = cpu_to_le64(hdr.u.data);
+		writeq(hdr.u.data, fifo->tx_base + MLXBF_TMFIFO_TX_DATA);
+	}
+
+	vring->cur_len = hdr_len;
+	vring->rem_len = vring->pkt_len;
+	fifo->vring[is_rx] = vring;
+}
+
+/*
+ * Rx/Tx one descriptor.
+ *
+ * Return true to indicate more data available.
+ */
+static bool mlxbf_tmfifo_rxtx_one_desc(struct mlxbf_tmfifo *fifo,
+				       struct mlxbf_tmfifo_vring *vring,
+				       bool is_rx, int *avail)
+{
+	struct virtio_device *vdev;
+	struct vring_desc *desc;
+	unsigned long flags;
+	struct vring *vr;
+	u32 len, idx;
+
+	vdev = &fifo->vdev[vring->vdev_id]->vdev;
+
+	/* Get the descriptor of the next packet. */
+	desc = vring->desc;
+	if (!desc) {
+		desc = mlxbf_tmfifo_get_next_pkt(vdev, vring, is_rx);
+		if (!desc)
+			return false;
+	}
+
+	vr = (struct vring *)virtqueue_get_vring(vring->vq);
+
+	/* Beginning of a packet. Start to Rx/Tx packet header. */
+	if (vring->pkt_len == 0) {
+		int vring_change = 0;
+
+		mlxbf_tmfifo_rxtx_header(fifo, vdev, vring, vr, desc, is_rx,
+					 avail, &vring_change);
+		/* Return if new packet is for another ring. */
+		if (vring_change)
+			return false;
+		goto done;
+	}
+
+	/* Get the length of this desc. */
+	len = virtio32_to_cpu(vdev, desc->len);
+	if (len > vring->rem_len)
+		len = vring->rem_len;
+
+	/* Rx/Tx one word (8 bytes) if not done. */
+	if (vring->cur_len != len)
+		mlxbf_tmfifo_rxtx_word(fifo, vdev, vring, desc, is_rx, avail,
+				       len);
+
+	/* Check again whether it's done. */
+	if (vring->cur_len == len) {
+		vring->cur_len = 0;
+		vring->rem_len -= len;
+
+		/* Get the next desc on the chain. */
+		if (vring->rem_len > 0 &&
+		    (virtio16_to_cpu(vdev, desc->flags) & VRING_DESC_F_NEXT)) {
+			idx = virtio16_to_cpu(vdev, desc->next);
+			desc = &vr->desc[idx];
+			goto done;
+		}
+
+		/* Done and release the desc. */
+		mlxbf_tmfifo_release_pkt(vdev, vring, &desc);
+		fifo->vring[is_rx] = NULL;
+
+		/* Notify upper layer that packet is done. */
+		spin_lock_irqsave(&fifo->spin_lock, flags);
+		vring_interrupt(0, vring->vq);
+		spin_unlock_irqrestore(&fifo->spin_lock, flags);
+	}
+
+done:
+	/* Save the current desc. */
+	vring->desc = desc;
+
+	return true;
+}
+
+/* Rx & Tx processing of a queue. */
+static void mlxbf_tmfifo_rxtx(struct virtqueue *vq, bool is_rx)
+{
+	struct mlxbf_tmfifo_vring *vring;
+	struct mlxbf_tmfifo *fifo;
+	int avail = 0;
+	bool more;
+
+	vring = (struct mlxbf_tmfifo_vring *)vq->priv;
+	fifo = vring->fifo;
+
+	/* Return if vdev is not ready. */
+	if (!fifo->vdev[vring->vdev_id])
+		return;
+
+	/* Return if another vring is running. */
+	if (fifo->vring[is_rx] && fifo->vring[is_rx] != vring)
+		return;
+
+	/* Only handle console and network for now. */
+	if (WARN_ON(vring->vdev_id != VIRTIO_ID_NET &&
+		    vring->vdev_id != VIRTIO_ID_CONSOLE))
+		return;
+
+	do {
+		/* Get available FIFO space. */
+		if (avail == 0) {
+			if (is_rx)
+				avail = mlxbf_tmfifo_get_rx_avail(fifo);
+			else
+				avail = mlxbf_tmfifo_get_tx_avail(fifo, vring);
+			if (avail <= 0)
+				break;
+		}
+
+		/* Console output always comes from the Tx buffer. */
+		if (!is_rx && vring->vdev_id == VIRTIO_ID_CONSOLE) {
+			mlxbf_tmfifo_console_tx(fifo, avail);
+			break;
+		}
+
+		/* Try to handle one descriptor. */
+		more = mlxbf_tmfifo_rxtx_one_desc(fifo, vring, is_rx, &avail);
+	} while (more);
+}
+
+/* Handle Rx or Tx queues. */
+static void mlxbf_tmfifo_work_rxtx(struct mlxbf_tmfifo *fifo, int queue_id,
+				   int irq_id, bool is_rx)
+{
+	struct mlxbf_tmfifo_vdev *tm_vdev;
+	struct virtqueue *vq;
+	int i;
+
+	if (!test_and_clear_bit(irq_id, &fifo->pend_events) ||
+	    !fifo->irq_info[irq_id].irq)
+		return;
+
+	for (i = 0; i < MLXBF_TMFIFO_VDEV_MAX; i++) {
+		tm_vdev = fifo->vdev[i];
+		if (tm_vdev) {
+			vq = tm_vdev->vrings[queue_id].vq;
+			if (vq)
+				mlxbf_tmfifo_rxtx(vq, is_rx);
+		}
+	}
+}
+
+/* Work handler for Rx and Tx case. */
+static void mlxbf_tmfifo_work_handler(struct work_struct *work)
+{
+	struct mlxbf_tmfifo *fifo;
+
+	fifo = container_of(work, struct mlxbf_tmfifo, work);
+	if (!fifo->is_ready)
+		return;
+
+	mutex_lock(&fifo->lock);
+
+	/* Tx (Send data to the TmFifo). */
+	mlxbf_tmfifo_work_rxtx(fifo, MLXBF_TMFIFO_VRING_TX,
+			       MLXBF_TM_TX_LWM_IRQ, false);
+
+	/* Rx (Receive data from the TmFifo). */
+	mlxbf_tmfifo_work_rxtx(fifo, MLXBF_TMFIFO_VRING_RX,
+			       MLXBF_TM_RX_HWM_IRQ, true);
+
+	mutex_unlock(&fifo->lock);
+}
+
+/* The notify function is called when new buffers are posted. */
+static bool mlxbf_tmfifo_virtio_notify(struct virtqueue *vq)
+{
+	struct mlxbf_tmfifo_vring *vring;
+	struct mlxbf_tmfifo_vdev *vdev;
+	struct mlxbf_tmfifo *fifo;
+	unsigned long flags;
+
+	vring = (struct mlxbf_tmfifo_vring *)vq->priv;
+	fifo = vring->fifo;
+
+	/*
+	 * Virtio maintains vrings in pairs, even number ring for Rx
+	 * and odd number ring for Tx.
+	 */
+	if (!(vring->id & 1)) {
+		/* Set the RX HWM bit to start Rx. */
+		if (!test_and_set_bit(MLXBF_TM_RX_HWM_IRQ, &fifo->pend_events))
+			schedule_work(&fifo->work);
+	} else {
+		/*
+		 * Console could make blocking call with interrupts disabled.
+		 * In such case, the vring needs to be served right away. For
+		 * other cases, just set the TX LWM bit to start Tx in the
+		 * worker handler.
+		 */
+		if (vring->vdev_id == VIRTIO_ID_CONSOLE) {
+			spin_lock_irqsave(&fifo->spin_lock, flags);
+			vdev = fifo->vdev[VIRTIO_ID_CONSOLE];
+			mlxbf_tmfifo_console_output(vdev, vq);
+			spin_unlock_irqrestore(&fifo->spin_lock, flags);
+			schedule_work(&fifo->work);
+		} else if (!test_and_set_bit(MLXBF_TM_TX_LWM_IRQ,
+					     &fifo->pend_events)) {
+			schedule_work(&fifo->work);
+		}
+	}
+
+	return true;
+}
+
+/* Get the array of feature bits for this device. */
+static u64 mlxbf_tmfifo_virtio_get_features(struct virtio_device *vdev)
+{
+	struct mlxbf_tmfifo_vdev *tm_vdev;
+
+	tm_vdev = container_of(vdev, struct mlxbf_tmfifo_vdev, vdev);
+	return tm_vdev->features;
+}
+
+/* Confirm device features to use. */
+static int mlxbf_tmfifo_virtio_finalize_features(struct virtio_device *vdev)
+{
+	struct mlxbf_tmfifo_vdev *tm_vdev;
+
+	tm_vdev = container_of(vdev, struct mlxbf_tmfifo_vdev, vdev);
+	tm_vdev->features = vdev->features;
+
+	return 0;
+}
+
+/* Free virtqueues found by find_vqs(). */
+static void mlxbf_tmfifo_virtio_del_vqs(struct virtio_device *vdev)
+{
+	struct mlxbf_tmfifo_vdev *tm_vdev;
+	struct mlxbf_tmfifo_vring *vring;
+	struct virtqueue *vq;
+	int i;
+
+	tm_vdev = container_of(vdev, struct mlxbf_tmfifo_vdev, vdev);
+
+	for (i = 0; i < ARRAY_SIZE(tm_vdev->vrings); i++) {
+		vring = &tm_vdev->vrings[i];
+
+		/* Release the pending packet. */
+		if (vring->desc)
+			mlxbf_tmfifo_release_pkt(&tm_vdev->vdev, vring,
+						 &vring->desc);
+		vq = vring->vq;
+		if (vq) {
+			vring->vq = NULL;
+			vring_del_virtqueue(vq);
+		}
+	}
+}
+
+/* Create and initialize the virtual queues. */
+static int mlxbf_tmfifo_virtio_find_vqs(struct virtio_device *vdev,
+					unsigned int nvqs,
+					struct virtqueue *vqs[],
+					vq_callback_t *callbacks[],
+					const char * const names[],
+					const bool *ctx,
+					struct irq_affinity *desc)
+{
+	struct mlxbf_tmfifo_vdev *tm_vdev;
+	struct mlxbf_tmfifo_vring *vring;
+	struct virtqueue *vq;
+	int i, ret, size;
+
+	tm_vdev = container_of(vdev, struct mlxbf_tmfifo_vdev, vdev);
+	if (nvqs > ARRAY_SIZE(tm_vdev->vrings))
+		return -EINVAL;
+
+	for (i = 0; i < nvqs; ++i) {
+		if (!names[i]) {
+			ret = -EINVAL;
+			goto error;
+		}
+		vring = &tm_vdev->vrings[i];
+
+		/* zero vring */
+		size = vring_size(vring->size, vring->align);
+		memset(vring->va, 0, size);
+		vq = vring_new_virtqueue(i, vring->size, vring->align, vdev,
+					 false, false, vring->va,
+					 mlxbf_tmfifo_virtio_notify,
+					 callbacks[i], names[i]);
+		if (!vq) {
+			dev_err(&vdev->dev, "vring_new_virtqueue failed\n");
+			ret = -ENOMEM;
+			goto error;
+		}
+
+		vqs[i] = vq;
+		vring->vq = vq;
+		vq->priv = vring;
+	}
+
+	return 0;
+
+error:
+	mlxbf_tmfifo_virtio_del_vqs(vdev);
+	return ret;
+}
+
+/* Read the status byte. */
+static u8 mlxbf_tmfifo_virtio_get_status(struct virtio_device *vdev)
+{
+	struct mlxbf_tmfifo_vdev *tm_vdev;
+
+	tm_vdev = container_of(vdev, struct mlxbf_tmfifo_vdev, vdev);
+
+	return tm_vdev->status;
+}
+
+/* Write the status byte. */
+static void mlxbf_tmfifo_virtio_set_status(struct virtio_device *vdev,
+					   u8 status)
+{
+	struct mlxbf_tmfifo_vdev *tm_vdev;
+
+	tm_vdev = container_of(vdev, struct mlxbf_tmfifo_vdev, vdev);
+	tm_vdev->status = status;
+}
+
+/* Reset the device. Not much here for now. */
+static void mlxbf_tmfifo_virtio_reset(struct virtio_device *vdev)
+{
+	struct mlxbf_tmfifo_vdev *tm_vdev;
+
+	tm_vdev = container_of(vdev, struct mlxbf_tmfifo_vdev, vdev);
+	tm_vdev->status = 0;
+}
+
+/* Read the value of a configuration field. */
+static void mlxbf_tmfifo_virtio_get(struct virtio_device *vdev,
+				    unsigned int offset,
+				    void *buf,
+				    unsigned int len)
+{
+	struct mlxbf_tmfifo_vdev *tm_vdev;
+
+	tm_vdev = container_of(vdev, struct mlxbf_tmfifo_vdev, vdev);
+
+	if (offset + len > sizeof(tm_vdev->config)) {
+		dev_err(vdev->dev.parent, "virtio_get out of bounds\n");
+		return;
+	}
+
+	memcpy(buf, (u8 *)&tm_vdev->config + offset, len);
+}
+
+/* Write the value of a configuration field. */
+static void mlxbf_tmfifo_virtio_set(struct virtio_device *vdev,
+				    unsigned int offset,
+				    const void *buf,
+				    unsigned int len)
+{
+	struct mlxbf_tmfifo_vdev *tm_vdev;
+
+	tm_vdev = container_of(vdev, struct mlxbf_tmfifo_vdev, vdev);
+
+	if (offset + len > sizeof(tm_vdev->config)) {
+		dev_err(vdev->dev.parent, "virtio_set out of bounds\n");
+		return;
+	}
+
+	memcpy((u8 *)&tm_vdev->config + offset, buf, len);
+}
+
+/* Virtio config operations. */
+static const struct virtio_config_ops mlxbf_tmfifo_virtio_config_ops = {
+	.get_features = mlxbf_tmfifo_virtio_get_features,
+	.finalize_features = mlxbf_tmfifo_virtio_finalize_features,
+	.find_vqs = mlxbf_tmfifo_virtio_find_vqs,
+	.del_vqs = mlxbf_tmfifo_virtio_del_vqs,
+	.reset = mlxbf_tmfifo_virtio_reset,
+	.set_status = mlxbf_tmfifo_virtio_set_status,
+	.get_status = mlxbf_tmfifo_virtio_get_status,
+	.get = mlxbf_tmfifo_virtio_get,
+	.set = mlxbf_tmfifo_virtio_set,
+};
+
+/* Create vdev type in a tmfifo. */
+static int mlxbf_tmfifo_create_vdev(struct device *dev,
+				    struct mlxbf_tmfifo *fifo,
+				    int vdev_id, u64 features,
+				    void *config, u32 size)
+{
+	struct mlxbf_tmfifo_vdev *tm_vdev;
+	int ret = 0;
+
+	mutex_lock(&fifo->lock);
+
+	tm_vdev = fifo->vdev[vdev_id];
+	if (tm_vdev) {
+		dev_err(dev, "vdev %d already exists\n", vdev_id);
+		ret = -EEXIST;
+		goto fail;
+	}
+
+	tm_vdev = devm_kzalloc(dev, sizeof(*tm_vdev), GFP_KERNEL);
+	if (!tm_vdev) {
+		ret = -ENOMEM;
+		goto fail;
+	}
+
+	tm_vdev->vdev.id.device = vdev_id;
+	tm_vdev->vdev.config = &mlxbf_tmfifo_virtio_config_ops;
+	tm_vdev->vdev.dev.parent = &fifo->pdev->dev;
+	tm_vdev->features = features;
+	if (config)
+		memcpy(&tm_vdev->config, config, size);
+	if (mlxbf_tmfifo_alloc_vrings(fifo, tm_vdev, vdev_id)) {
+		dev_err(dev, "unable to allocate vring\n");
+		ret = -ENOMEM;
+		goto fail;
+	}
+	if (vdev_id == VIRTIO_ID_CONSOLE)
+		tm_vdev->tx_buf = devm_kmalloc(dev,
+					       MLXBF_TMFIFO_CONS_TX_BUF_SIZE,
+					       GFP_KERNEL);
+	fifo->vdev[vdev_id] = tm_vdev;
+
+	/* Register the virtio device. */
+	ret = register_virtio_device(&tm_vdev->vdev);
+	if (ret) {
+		dev_err(&fifo->pdev->dev, "register_virtio_device failed\n");
+		goto register_fail;
+	}
+
+	mutex_unlock(&fifo->lock);
+	return 0;
+
+register_fail:
+	mlxbf_tmfifo_free_vrings(fifo, vdev_id);
+	fifo->vdev[vdev_id] = NULL;
+fail:
+	mutex_unlock(&fifo->lock);
+	return ret;
+}
+
+/* Delete vdev type from a tmfifo. */
+static int mlxbf_tmfifo_delete_vdev(struct mlxbf_tmfifo *fifo, int vdev_id)
+{
+	struct mlxbf_tmfifo_vdev *tm_vdev;
+
+	mutex_lock(&fifo->lock);
+
+	/* Unregister vdev. */
+	tm_vdev = fifo->vdev[vdev_id];
+	if (tm_vdev) {
+		unregister_virtio_device(&tm_vdev->vdev);
+		mlxbf_tmfifo_free_vrings(fifo, vdev_id);
+		fifo->vdev[vdev_id] = NULL;
+	}
+
+	mutex_unlock(&fifo->lock);
+
+	return 0;
+}
+
+/* Read the configured network MAC address from efi variable. */
+static void mlxbf_tmfifo_get_cfg_mac(u8 *mac)
+{
+	efi_guid_t guid = EFI_GLOBAL_VARIABLE_GUID;
+	efi_status_t status;
+	unsigned long size;
+	u8 buf[6];
+
+	size = sizeof(buf);
+	status = efi.get_variable(mlxbf_tmfifo_efi_name, &guid, NULL, &size,
+				  buf);
+	if (status == EFI_SUCCESS && size == sizeof(buf))
+		memcpy(mac, buf, sizeof(buf));
+}
+
+/* Set TmFifo thresolds which is used to trigger interrupts. */
+static void mlxbf_tmfifo_set_threshold(struct mlxbf_tmfifo *fifo)
+{
+	u64 ctl;
+
+	/* Get Tx FIFO size and set the low/high watermark. */
+	ctl = readq(fifo->tx_base + MLXBF_TMFIFO_TX_CTL);
+	fifo->tx_fifo_size =
+		FIELD_GET(MLXBF_TMFIFO_TX_CTL__MAX_ENTRIES_MASK, ctl);
+	ctl = (ctl & ~MLXBF_TMFIFO_TX_CTL__LWM_MASK) |
+		FIELD_PREP(MLXBF_TMFIFO_TX_CTL__LWM_MASK,
+			   fifo->tx_fifo_size / 2);
+	ctl = (ctl & ~MLXBF_TMFIFO_TX_CTL__HWM_MASK) |
+		FIELD_PREP(MLXBF_TMFIFO_TX_CTL__HWM_MASK,
+			   fifo->tx_fifo_size - 1);
+	writeq(ctl, fifo->tx_base + MLXBF_TMFIFO_TX_CTL);
+
+	/* Get Rx FIFO size and set the low/high watermark. */
+	ctl = readq(fifo->rx_base + MLXBF_TMFIFO_RX_CTL);
+	fifo->rx_fifo_size =
+		FIELD_GET(MLXBF_TMFIFO_RX_CTL__MAX_ENTRIES_MASK, ctl);
+	ctl = (ctl & ~MLXBF_TMFIFO_RX_CTL__LWM_MASK) |
+		FIELD_PREP(MLXBF_TMFIFO_RX_CTL__LWM_MASK, 0);
+	ctl = (ctl & ~MLXBF_TMFIFO_RX_CTL__HWM_MASK) |
+		FIELD_PREP(MLXBF_TMFIFO_RX_CTL__HWM_MASK, 1);
+	writeq(ctl, fifo->rx_base + MLXBF_TMFIFO_RX_CTL);
+}
+
+/* Device remove function. */
+static void mlxbf_tmfifo_cleanup(struct mlxbf_tmfifo *fifo)
+{
+	int i;
+
+	mutex_lock(&mlxbf_tmfifo_lock);
+	fifo->is_ready = false;
+
+	/* Stop the timer. */
+	del_timer_sync(&fifo->timer);
+
+	/* Disable interrupts. */
+	mlxbf_tmfifo_disable_irqs(fifo);
+
+	/* Cancel the pending work. */
+	cancel_work_sync(&fifo->work);
+
+	for (i = 0; i < MLXBF_TMFIFO_VDEV_MAX; i++)
+		mlxbf_tmfifo_delete_vdev(fifo, i);
+
+	mutex_unlock(&mlxbf_tmfifo_lock);
+}
+
+/* Probe the TMFIFO. */
+static int mlxbf_tmfifo_probe(struct platform_device *pdev)
+{
+	struct virtio_net_config net_config;
+	struct resource *rx_res, *tx_res;
+	struct mlxbf_tmfifo *fifo;
+	int i, ret;
+
+	/* Get the resource of the Rx FIFO. */
+	rx_res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
+	if (!rx_res)
+		return -ENODEV;
+
+	/* Get the resource of the Tx FIFO. */
+	tx_res = platform_get_resource(pdev, IORESOURCE_MEM, 1);
+	if (!tx_res)
+		return -ENODEV;
+
+	if (!devm_request_mem_region(&pdev->dev, rx_res->start,
+				     resource_size(rx_res), "bf-tmfifo"))
+		return -EBUSY;
+
+	if (!devm_request_mem_region(&pdev->dev, tx_res->start,
+				     resource_size(tx_res), "bf-tmfifo"))
+		return -EBUSY;
+
+	fifo = devm_kzalloc(&pdev->dev, sizeof(*fifo), GFP_KERNEL);
+	if (!fifo)
+		return -ENOMEM;
+
+	fifo->pdev = pdev;
+	platform_set_drvdata(pdev, fifo);
+
+	spin_lock_init(&fifo->spin_lock);
+	INIT_WORK(&fifo->work, mlxbf_tmfifo_work_handler);
+
+	timer_setup(&fifo->timer, mlxbf_tmfifo_timer, 0);
+
+	for (i = 0; i < MLXBF_TM_IRQ_CNT; i++) {
+		fifo->irq_info[i].index = i;
+		fifo->irq_info[i].fifo = fifo;
+		fifo->irq_info[i].irq = platform_get_irq(pdev, i);
+		ret = devm_request_irq(&pdev->dev, fifo->irq_info[i].irq,
+				       mlxbf_tmfifo_irq_handler, 0,
+				       "tmfifo", &fifo->irq_info[i]);
+		if (ret) {
+			dev_err(&pdev->dev, "devm_request_irq failed\n");
+			fifo->irq_info[i].irq = 0;
+			return ret;
+		}
+	}
+
+	fifo->rx_base = devm_ioremap(&pdev->dev, rx_res->start,
+				     resource_size(rx_res));
+	if (!fifo->rx_base)
+		return -ENOMEM;
+
+	fifo->tx_base = devm_ioremap(&pdev->dev, tx_res->start,
+				     resource_size(tx_res));
+	if (!fifo->tx_base)
+		return -ENOMEM;
+
+	mutex_init(&fifo->lock);
+
+	mlxbf_tmfifo_set_threshold(fifo);
+
+	/* Create the console vdev. */
+	ret = mlxbf_tmfifo_create_vdev(&pdev->dev, fifo, VIRTIO_ID_CONSOLE, 0,
+				       NULL, 0);
+	if (ret)
+		goto fail;
+
+	/* Create the network vdev. */
+	memset(&net_config, 0, sizeof(net_config));
+	net_config.mtu = MLXBF_TMFIFO_NET_MTU;
+	net_config.status = VIRTIO_NET_S_LINK_UP;
+	memcpy(net_config.mac, mlxbf_tmfifo_net_default_mac, 6);
+	mlxbf_tmfifo_get_cfg_mac(net_config.mac);
+	ret = mlxbf_tmfifo_create_vdev(&pdev->dev, fifo, VIRTIO_ID_NET,
+				       MLXBF_TMFIFO_NET_FEATURES, &net_config,
+				       sizeof(net_config));
+	if (ret)
+		goto fail;
+
+	mod_timer(&fifo->timer, jiffies + mlxbf_tmfifo_timer_interval);
+
+	fifo->is_ready = true;
+	return 0;
+
+fail:
+	mlxbf_tmfifo_cleanup(fifo);
+	return ret;
+}
+
+/* Device remove function. */
+static int mlxbf_tmfifo_remove(struct platform_device *pdev)
+{
+	struct mlxbf_tmfifo *fifo = platform_get_drvdata(pdev);
+
+	if (fifo)
+		mlxbf_tmfifo_cleanup(fifo);
+
+	platform_set_drvdata(pdev, NULL);
+
+	return 0;
+}
+
+static const struct acpi_device_id mlxbf_tmfifo_acpi_match[] = {
+	{ "MLNXBF01", 0 },
+	{}
+};
+MODULE_DEVICE_TABLE(acpi, mlxbf_tmfifo_acpi_match);
+
+static struct platform_driver mlxbf_tmfifo_driver = {
+	.probe = mlxbf_tmfifo_probe,
+	.remove = mlxbf_tmfifo_remove,
+	.driver = {
+		.name = "bf-tmfifo",
+		.acpi_match_table = mlxbf_tmfifo_acpi_match,
+	},
+};
+
+module_platform_driver(mlxbf_tmfifo_driver);
+
+MODULE_DESCRIPTION("Mellanox BlueField SoC TmFifo Driver");
+MODULE_LICENSE("GPL");
+MODULE_AUTHOR("Mellanox Technologies");
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* RE: [PATCH v8 1/2] platform/mellanox: Add TmFifo driver for Mellanox BlueField Soc
  2019-01-29 22:06   ` Andy Shevchenko
@ 2019-02-13 16:33     ` Liming Sun
  0 siblings, 0 replies; 30+ messages in thread
From: Liming Sun @ 2019-02-13 16:33 UTC (permalink / raw)
  To: Andy Shevchenko
  Cc: Rob Herring, Mark Rutland, Arnd Bergmann, David Woods,
	Andy Shevchenko, Darren Hart, Vadim Pasternak, devicetree,
	Linux Kernel Mailing List, Platform Driver

Thanks Andy. Sorry I had email issue earlier today. Not sure the reply was sent out or not. So sent another one just in case...

v9 of this patch has been posted to address the 'devm_' comment. It also has the coding-style changes according to the comments I got for another patch.

Regards,
Liming

> -----Original Message-----
> From: Andy Shevchenko <andy.shevchenko@gmail.com>
> Sent: Tuesday, January 29, 2019 5:07 PM
> To: Liming Sun <lsun@mellanox.com>
> Cc: Rob Herring <robh+dt@kernel.org>; Mark Rutland <mark.rutland@arm.com>; Arnd Bergmann <arnd@arndb.de>; David Woods
> <dwoods@mellanox.com>; Andy Shevchenko <andy@infradead.org>; Darren Hart <dvhart@infradead.org>; Vadim Pasternak
> <vadimp@mellanox.com>; devicetree <devicetree@vger.kernel.org>; Linux Kernel Mailing List <linux-kernel@vger.kernel.org>;
> Platform Driver <platform-driver-x86@vger.kernel.org>
> Subject: Re: [PATCH v8 1/2] platform/mellanox: Add TmFifo driver for Mellanox BlueField Soc
> 
> On Mon, Jan 28, 2019 at 7:28 PM Liming Sun <lsun@mellanox.com> wrote:
> >
> > This commit adds the TmFifo platform driver for Mellanox BlueField
> > Soc. TmFifo is a shared FIFO which enables external host machine
> > to exchange data with the SoC via USB or PCIe. The driver is based
> > on virtio framework and has console and network access enabled.
> >
> > Reviewed-by: David Woods <dwoods@mellanox.com>
> > Signed-off-by: Liming Sun <lsun@mellanox.com>
> 
> 
> Please, go through this series taking into account review I just did
> for your another patch.
> 
> On top of that, see recent (for few years I think) drivers what modern
> APIs they are using, e.g. devm_.
> 
> --
> With Best Regards,
> Andy Shevchenko

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v9] platform/mellanox: Add TmFifo driver for Mellanox BlueField Soc
  2019-02-13 13:27 ` [PATCH v9] platform/mellanox: Add TmFifo driver for Mellanox BlueField Soc Liming Sun
@ 2019-02-13 18:11   ` Andy Shevchenko
  2019-02-13 18:34     ` Liming Sun
                       ` (2 more replies)
  0 siblings, 3 replies; 30+ messages in thread
From: Andy Shevchenko @ 2019-02-13 18:11 UTC (permalink / raw)
  To: Liming Sun
  Cc: David Woods, Andy Shevchenko, Darren Hart, Vadim Pasternak,
	Linux Kernel Mailing List, Platform Driver

On Wed, Feb 13, 2019 at 3:27 PM Liming Sun <lsun@mellanox.com> wrote:
>
> This commit adds the TmFifo platform driver for Mellanox BlueField
> Soc. TmFifo is a shared FIFO which enables external host machine
> to exchange data with the SoC via USB or PCIe. The driver is based
> on virtio framework and has console and network access enabled.

Thanks for an update, my comments below.

Again, to Mellanox: guys, please, establish internal mailing list for
review and don't come with such quality of code.

Next time I would like to see Reviewed-by from Mellanox people I know,
like Vadim or Leon.

> +config MLXBF_TMFIFO
> +       tristate "Mellanox BlueField SoC TmFifo platform driver"

> +       depends on ARM64 && ACPI && VIRTIO_CONSOLE && VIRTIO_NET

Split this to three logical parts.

> +       help
> +         Say y here to enable TmFifo support. The TmFifo driver provides
> +          platform driver support for the TmFifo which supports console
> +          and networking based on the virtio framework.

>  obj-$(CONFIG_MLXREG_HOTPLUG)   += mlxreg-hotplug.o
>  obj-$(CONFIG_MLXREG_IO) += mlxreg-io.o
> +obj-$(CONFIG_MLXBF_TMFIFO)     += mlxbf-tmfifo.o

I would suggest to keep it sorted.

> +#define MLXBF_TMFIFO_TX_DATA 0x0

I suggest to use same fixed format for offsets.
Here, for example, 0x00 would be better.

> +#define MLXBF_TMFIFO_TX_STS__COUNT_RMASK 0x1ff
> +#define MLXBF_TMFIFO_TX_STS__COUNT_MASK  0x1ff

#include <linux/bits.h>
...
GENMASK()

> +#define MLXBF_TMFIFO_TX_CTL__LWM_RMASK 0xff
> +#define MLXBF_TMFIFO_TX_CTL__LWM_MASK  0xff

> +#define MLXBF_TMFIFO_TX_CTL__HWM_RMASK 0xff
> +#define MLXBF_TMFIFO_TX_CTL__HWM_MASK  0xff00

> +#define MLXBF_TMFIFO_TX_CTL__MAX_ENTRIES_RMASK 0x1ff
> +#define MLXBF_TMFIFO_TX_CTL__MAX_ENTRIES_MASK  0x1ff00000000ULL

GENMASK() / GENMASK_ULL()

> +#define MLXBF_TMFIFO_RX_STS__COUNT_RMASK 0x1ff
> +#define MLXBF_TMFIFO_RX_STS__COUNT_MASK  0x1ff

GENMASK()

> +#define MLXBF_TMFIFO_RX_CTL__LWM_RMASK 0xff
> +#define MLXBF_TMFIFO_RX_CTL__LWM_MASK  0xff

> +#define MLXBF_TMFIFO_RX_CTL__HWM_RMASK 0xff
> +#define MLXBF_TMFIFO_RX_CTL__HWM_MASK  0xff00

> +#define MLXBF_TMFIFO_RX_CTL__MAX_ENTRIES_RMASK 0x1ff
> +#define MLXBF_TMFIFO_RX_CTL__MAX_ENTRIES_MASK  0x1ff00000000ULL

Ditto.

> +#include <linux/acpi.h>
> +#include <linux/byteorder/generic.h>
> +#include <linux/bitfield.h>
> +#include <linux/cache.h>
> +#include <linux/device.h>
> +#include <linux/dma-mapping.h>
> +#include <linux/efi.h>
> +#include <linux/io.h>
> +#include <linux/interrupt.h>
> +#include <linux/irq.h>
> +#include <linux/kernel.h>
> +#include <linux/math64.h>
> +#include <linux/module.h>
> +#include <linux/moduleparam.h>
> +#include <linux/mutex.h>
> +#include <linux/platform_device.h>
> +#include <linux/resource.h>
> +#include <linux/slab.h>
> +#include <linux/types.h>
> +#include <linux/version.h>
> +#include <linux/virtio.h>
> +#include <linux/virtio_config.h>
> +#include <linux/virtio_console.h>
> +#include <linux/virtio_ids.h>
> +#include <linux/virtio_net.h>
> +#include <linux/virtio_ring.h>

Do you need all of them?

> +#define MLXBF_TMFIFO_VRING_SIZE                        1024

SZ_1K ?

> +/* Console Tx buffer size. */
> +#define MLXBF_TMFIFO_CONS_TX_BUF_SIZE          (32 * 1024)

SZ_32K ?

> +/* House-keeping timer interval. */
> +static int mlxbf_tmfifo_timer_interval = HZ / 10;

> +/* Global lock. */

Noise. Either explain what it protects, or remove.

> +static DEFINE_MUTEX(mlxbf_tmfifo_lock);

> +/* Struct declaration. */

Noise.

> +/* Structure to maintain the ring state. */
> +struct mlxbf_tmfifo_vring {
> +       void *va;                       /* virtual address */
> +       dma_addr_t dma;                 /* dma address */
> +       struct virtqueue *vq;           /* virtqueue pointer */
> +       struct vring_desc *desc;        /* current desc */
> +       struct vring_desc *desc_head;   /* current desc head */
> +       int cur_len;                    /* processed len in current desc */
> +       int rem_len;                    /* remaining length to be processed */
> +       int size;                       /* vring size */
> +       int align;                      /* vring alignment */
> +       int id;                         /* vring id */
> +       int vdev_id;                    /* TMFIFO_VDEV_xxx */
> +       u32 pkt_len;                    /* packet total length */
> +       u16 next_avail;                 /* next avail desc id */
> +       struct mlxbf_tmfifo *fifo;      /* pointer back to the tmfifo */
> +};

Perhaps kernel-doc?

> +/* Interrupt types. */
> +enum {
> +       MLXBF_TM_RX_LWM_IRQ,            /* Rx low water mark irq */
> +       MLXBF_TM_RX_HWM_IRQ,            /* Rx high water mark irq */
> +       MLXBF_TM_TX_LWM_IRQ,            /* Tx low water mark irq */
> +       MLXBF_TM_TX_HWM_IRQ,            /* Tx high water mark irq */
> +       MLXBF_TM_IRQ_CNT

CNT...

> +};
> +
> +/* Ring types (Rx & Tx). */
> +enum {
> +       MLXBF_TMFIFO_VRING_RX,          /* Rx ring */
> +       MLXBF_TMFIFO_VRING_TX,          /* Tx ring */
> +       MLXBF_TMFIFO_VRING_NUM

...NUM

Perhaps one style for max numbers?

> +};

> +
> +/* Structure for the virtual device. */
> +struct mlxbf_tmfifo_vdev {
> +       struct virtio_device vdev;      /* virtual device */
> +       u8 status;
> +       u64 features;
> +       union {                         /* virtio config space */
> +               struct virtio_console_config cons;
> +               struct virtio_net_config net;
> +       } config;

Describe, which field allows to distinguish what type of the data is in a union.

> +       struct mlxbf_tmfifo_vring vrings[MLXBF_TMFIFO_VRING_NUM];
> +       u8 *tx_buf;                     /* tx buffer */
> +       u32 tx_head;                    /* tx buffer head */
> +       u32 tx_tail;                    /* tx buffer tail */
> +};

kernel-doc?

> +/* Structure of the interrupt information. */
> +struct mlxbf_tmfifo_irq_info {
> +       struct mlxbf_tmfifo *fifo;      /* tmfifo structure */
> +       int irq;                        /* interrupt number */
> +       int index;                      /* array index */
> +};

Ditto.

> +
> +/* Structure of the TmFifo information. */
> +struct mlxbf_tmfifo {
> +       struct mlxbf_tmfifo_vdev *vdev[MLXBF_TMFIFO_VDEV_MAX]; /* devices */
> +       struct platform_device *pdev;   /* platform device */
> +       struct mutex lock;              /* fifo lock */
> +       void __iomem *rx_base;          /* mapped register base */
> +       void __iomem *tx_base;          /* mapped register base */
> +       int tx_fifo_size;               /* number of entries of the Tx FIFO */
> +       int rx_fifo_size;               /* number of entries of the Rx FIFO */
> +       unsigned long pend_events;      /* pending bits for deferred process */
> +       struct mlxbf_tmfifo_irq_info irq_info[MLXBF_TM_IRQ_CNT]; /* irq info */
> +       struct work_struct work;        /* work struct for deferred process */
> +       struct timer_list timer;        /* keepalive timer */
> +       struct mlxbf_tmfifo_vring *vring[2];    /* current Tx/Rx ring */
> +       bool is_ready;                  /* ready flag */
> +       spinlock_t spin_lock;           /* spin lock */
> +};

Ditto.

> +/* Use a union struction for 64-bit little/big endian. */

What does this mean?

> +union mlxbf_tmfifo_data_64bit {
> +       u64 data;
> +       __le64 data_le;
> +};
> +
> +/* Message header used to demux data in the TmFifo. */
> +union mlxbf_tmfifo_msg_hdr {
> +       struct {
> +               u8 type;                /* message type */
> +               __be16 len;             /* payload length */
> +               u8 unused[5];           /* reserved, set to 0 */
> +       } __packed;

It's already packed. No?

> +       union mlxbf_tmfifo_data_64bit u;        /* 64-bit data */
> +};

> +/* MTU setting of the virtio-net interface. */
> +#define MLXBF_TMFIFO_NET_MTU           1500

Don't we have this globally defined?

> +/* Supported virtio-net features. */
> +#define MLXBF_TMFIFO_NET_FEATURES      ((1UL << VIRTIO_NET_F_MTU) | \
> +                                        (1UL << VIRTIO_NET_F_STATUS) | \
> +                                        (1UL << VIRTIO_NET_F_MAC))

BIT_UL() ?

> +/* Function declarations. */

Noise.

> +static int mlxbf_tmfifo_remove(struct platform_device *pdev);

Why do you need forward declaration for this?

> +/* Return the consumed Tx buffer space. */
> +static int mlxbf_tmfifo_vdev_tx_buf_len(struct mlxbf_tmfifo_vdev *vdev)
> +{
> +       return ((vdev->tx_tail >= vdev->tx_head) ?
> +               (vdev->tx_tail - vdev->tx_head) :
> +               (MLXBF_TMFIFO_CONS_TX_BUF_SIZE - vdev->tx_head +
> +                vdev->tx_tail));

Split this for better reading.

> +}
> +
> +/* Return the available Tx buffer space. */
> +static int mlxbf_tmfifo_vdev_tx_buf_avail(struct mlxbf_tmfifo_vdev *vdev)
> +{
> +       return (MLXBF_TMFIFO_CONS_TX_BUF_RSV_SIZE -
> +               mlxbf_tmfifo_vdev_tx_buf_len(vdev));

Redundant parens.
Moreover, you might consider temporary variable for better reading.

> +}
> +
> +/* Update Rx/Tx buffer index pointer. */
> +static void mlxbf_tmfifo_vdev_tx_buf_index_inc(u32 *index, u32 len)
> +{
> +       *index += len;
> +       if (*index >= MLXBF_TMFIFO_CONS_TX_BUF_SIZE)
> +               *index -= MLXBF_TMFIFO_CONS_TX_BUF_SIZE;
> +}
> +
> +/* Allocate vrings for the fifo. */
> +static int mlxbf_tmfifo_alloc_vrings(struct mlxbf_tmfifo *fifo,
> +                                    struct mlxbf_tmfifo_vdev *tm_vdev,
> +                                    int vdev_id)
> +{
> +       struct mlxbf_tmfifo_vring *vring;
> +       dma_addr_t dma;
> +       int i, size;
> +       void *va;
> +
> +       for (i = 0; i < ARRAY_SIZE(tm_vdev->vrings); i++) {
> +               vring = &tm_vdev->vrings[i];
> +               vring->fifo = fifo;
> +               vring->size = MLXBF_TMFIFO_VRING_SIZE;
> +               vring->align = SMP_CACHE_BYTES;
> +               vring->id = i;
> +               vring->vdev_id = vdev_id;
> +

> +               size = PAGE_ALIGN(vring_size(vring->size, vring->align));

Why do you need this?
dma_alloc_coherent() allocates memory on page granularity anyway.

> +               va = dma_alloc_coherent(tm_vdev->vdev.dev.parent, size, &dma,
> +                                       GFP_KERNEL);
> +               if (!va) {

> +                       dev_err(tm_vdev->vdev.dev.parent,

Would be much easy if you have temporary variable for this device.

> +                               "dma_alloc_coherent failed\n");
> +                       return -ENOMEM;
> +               }
> +
> +               vring->va = va;
> +               vring->dma = dma;
> +       }
> +
> +       return 0;
> +}

> +/* Interrupt handler. */
> +static irqreturn_t mlxbf_tmfifo_irq_handler(int irq, void *arg)
> +{
> +       struct mlxbf_tmfifo_irq_info *irq_info;
> +
> +       irq_info = (struct mlxbf_tmfifo_irq_info *)arg;

Useless casting.
Assignment can be done in definition block.

> +       if (irq_info->index < MLXBF_TM_IRQ_CNT &&
> +           !test_and_set_bit(irq_info->index, &irq_info->fifo->pend_events))
> +               schedule_work(&irq_info->fifo->work);
> +
> +       return IRQ_HANDLED;
> +}
> +
> +/* Get the next packet descriptor from the vring. */
> +static struct vring_desc *mlxbf_tmfifo_get_next_desc(struct virtqueue *vq)
> +{
> +       struct mlxbf_tmfifo_vring *vring;
> +       unsigned int idx, head;
> +       struct vring *vr;
> +
> +       vr = (struct vring *)virtqueue_get_vring(vq);

Return type is different? Is it safe to cast? Why?

> +       if (!vr)
> +               return NULL;

+ blank line

> +       vring = (struct mlxbf_tmfifo_vring *)vq->priv;

Do you need explicit casting?

> +       if (vring->next_avail == virtio16_to_cpu(vq->vdev, vr->avail->idx))
> +               return NULL;

+blank line

> +       idx = vring->next_avail % vr->num;
> +       head = virtio16_to_cpu(vq->vdev, vr->avail->ring[idx]);
> +       if (WARN_ON(head >= vr->num))
> +               return NULL;
> +       vring->next_avail++;
> +
> +       return &vr->desc[head];
> +}
> +
> +/* Release virtio descriptor. */
> +static void mlxbf_tmfifo_release_desc(struct virtio_device *vdev,
> +                                     struct vring *vr, struct vring_desc *desc,
> +                                     u32 len)
> +{
> +       u16 idx, vr_idx;
> +
> +       vr_idx = virtio16_to_cpu(vdev, vr->used->idx);
> +       idx = vr_idx % vr->num;
> +       vr->used->ring[idx].id = cpu_to_virtio32(vdev, desc - vr->desc);
> +       vr->used->ring[idx].len = cpu_to_virtio32(vdev, len);
> +
> +       /* Virtio could poll and check the 'idx' to decide
> +        * whether the desc is done or not. Add a memory
> +        * barrier here to make sure the update above completes
> +        * before updating the idx.
> +        */

Multi-line comment style is broken.

> +       mb();
> +       vr->used->idx = cpu_to_virtio16(vdev, vr_idx + 1);
> +}

> +/* House-keeping timer. */
> +static void mlxbf_tmfifo_timer(struct timer_list *arg)
> +{
> +       struct mlxbf_tmfifo *fifo;

> +       fifo = container_of(arg, struct mlxbf_tmfifo, timer);

Can't be done in the definition block?

> +       /*
> +        * Wake up the work handler to poll the Rx FIFO in case interrupt
> +        * missing or any leftover bytes stuck in the FIFO.
> +        */
> +       test_and_set_bit(MLXBF_TM_RX_HWM_IRQ, &fifo->pend_events);

How do you utilize test results?

> +
> +       /*
> +        * Wake up Tx handler in case virtio has queued too many packets
> +        * and are waiting for buffer return.
> +        */
> +       test_and_set_bit(MLXBF_TM_TX_LWM_IRQ, &fifo->pend_events);

Ditto.

> +
> +       schedule_work(&fifo->work);
> +
> +       mod_timer(&fifo->timer, jiffies + mlxbf_tmfifo_timer_interval);
> +}

> +       /* Adjust the size to available space. */
> +       if (size + sizeof(hdr) > avail * sizeof(u64))
> +               size = avail * sizeof(u64) - sizeof(hdr);

Can avail be 0?

> +       /* Write header. */
> +       hdr.u.data = 0;
> +       hdr.type = VIRTIO_ID_CONSOLE;
> +       hdr.len = htons(size);
> +       hdr.u.data_le = cpu_to_le64(hdr.u.data);

> +       writeq(hdr.u.data, fifo->tx_base + MLXBF_TMFIFO_TX_DATA);

So, this one is not protected anyhow? Potential race condition?

> +
> +       spin_lock_irqsave(&fifo->spin_lock, flags);
> +
> +       while (size > 0) {
> +               addr = cons->tx_buf + cons->tx_head;
> +
> +               if (cons->tx_head + sizeof(u64) <=
> +                   MLXBF_TMFIFO_CONS_TX_BUF_SIZE) {
> +                       memcpy(&data, addr, sizeof(u64));
> +               } else {
> +                       partial = MLXBF_TMFIFO_CONS_TX_BUF_SIZE - cons->tx_head;
> +                       memcpy(&data, addr, partial);
> +                       memcpy((u8 *)&data + partial, cons->tx_buf,
> +                              sizeof(u64) - partial);
> +               }
> +               writeq(data, fifo->tx_base + MLXBF_TMFIFO_TX_DATA);
> +
> +               if (size >= sizeof(u64)) {
> +                       mlxbf_tmfifo_vdev_tx_buf_index_inc(&cons->tx_head,
> +                                                          sizeof(u64));
> +                       size -= sizeof(u64);
> +               } else {
> +                       mlxbf_tmfifo_vdev_tx_buf_index_inc(&cons->tx_head,
> +                                                          size);
> +                       size = 0;
> +               }
> +       }
> +
> +       spin_unlock_irqrestore(&fifo->spin_lock, flags);
> +}

> +       /* Rx/Tx one word (8 bytes) if not done. */
> +       if (vring->cur_len != len)
> +               mlxbf_tmfifo_rxtx_word(fifo, vdev, vring, desc, is_rx, avail,
> +                                      len);

In such case better to keep it in one line.

> +/* Get the array of feature bits for this device. */
> +static u64 mlxbf_tmfifo_virtio_get_features(struct virtio_device *vdev)
> +{
> +       struct mlxbf_tmfifo_vdev *tm_vdev;
> +
> +       tm_vdev = container_of(vdev, struct mlxbf_tmfifo_vdev, vdev);
> +       return tm_vdev->features;
> +}
> +
> +/* Confirm device features to use. */
> +static int mlxbf_tmfifo_virtio_finalize_features(struct virtio_device *vdev)
> +{
> +       struct mlxbf_tmfifo_vdev *tm_vdev;
> +

> +       tm_vdev = container_of(vdev, struct mlxbf_tmfifo_vdev, vdev);

This is candidate to be a macro

#define mlxbt_vdev_to_tmfifo(...) ...

> +       tm_vdev->features = vdev->features;
> +
> +       return 0;
> +}

> +/* Create vdev type in a tmfifo. */
> +static int mlxbf_tmfifo_create_vdev(struct device *dev,
> +                                   struct mlxbf_tmfifo *fifo,
> +                                   int vdev_id, u64 features,
> +                                   void *config, u32 size)
> +{
> +       struct mlxbf_tmfifo_vdev *tm_vdev;
> +       int ret = 0;
> +
> +       mutex_lock(&fifo->lock);
> +
> +       tm_vdev = fifo->vdev[vdev_id];
> +       if (tm_vdev) {
> +               dev_err(dev, "vdev %d already exists\n", vdev_id);
> +               ret = -EEXIST;
> +               goto fail;
> +       }
> +
> +       tm_vdev = devm_kzalloc(dev, sizeof(*tm_vdev), GFP_KERNEL);
> +       if (!tm_vdev) {
> +               ret = -ENOMEM;
> +               goto fail;
> +       }
> +
> +       tm_vdev->vdev.id.device = vdev_id;
> +       tm_vdev->vdev.config = &mlxbf_tmfifo_virtio_config_ops;
> +       tm_vdev->vdev.dev.parent = &fifo->pdev->dev;
> +       tm_vdev->features = features;
> +       if (config)
> +               memcpy(&tm_vdev->config, config, size);
> +       if (mlxbf_tmfifo_alloc_vrings(fifo, tm_vdev, vdev_id)) {
> +               dev_err(dev, "unable to allocate vring\n");
> +               ret = -ENOMEM;
> +               goto fail;
> +       }
> +       if (vdev_id == VIRTIO_ID_CONSOLE)

> +               tm_vdev->tx_buf = devm_kmalloc(dev,
> +                                              MLXBF_TMFIFO_CONS_TX_BUF_SIZE,
> +                                              GFP_KERNEL);

Are you sure devm_ suits here?

> +       fifo->vdev[vdev_id] = tm_vdev;
> +
> +       /* Register the virtio device. */
> +       ret = register_virtio_device(&tm_vdev->vdev);
> +       if (ret) {
> +               dev_err(&fifo->pdev->dev, "register_virtio_device failed\n");
> +               goto register_fail;
> +       }
> +
> +       mutex_unlock(&fifo->lock);
> +       return 0;
> +
> +register_fail:
> +       mlxbf_tmfifo_free_vrings(fifo, vdev_id);
> +       fifo->vdev[vdev_id] = NULL;
> +fail:
> +       mutex_unlock(&fifo->lock);
> +       return ret;
> +}

> +/* Read the configured network MAC address from efi variable. */
> +static void mlxbf_tmfifo_get_cfg_mac(u8 *mac)
> +{
> +       efi_guid_t guid = EFI_GLOBAL_VARIABLE_GUID;
> +       efi_status_t status;
> +       unsigned long size;
> +       u8 buf[6];
> +
> +       size = sizeof(buf);
> +       status = efi.get_variable(mlxbf_tmfifo_efi_name, &guid, NULL, &size,
> +                                 buf);
> +       if (status == EFI_SUCCESS && size == sizeof(buf))
> +               memcpy(mac, buf, sizeof(buf));
> +}

Shouldn't be rather helper in EFI lib in kernel?

> +/* Probe the TMFIFO. */
> +static int mlxbf_tmfifo_probe(struct platform_device *pdev)
> +{
> +       struct virtio_net_config net_config;
> +       struct resource *rx_res, *tx_res;
> +       struct mlxbf_tmfifo *fifo;
> +       int i, ret;
> +
> +       /* Get the resource of the Rx FIFO. */
> +       rx_res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
> +       if (!rx_res)
> +               return -ENODEV;
> +
> +       /* Get the resource of the Tx FIFO. */
> +       tx_res = platform_get_resource(pdev, IORESOURCE_MEM, 1);
> +       if (!tx_res)
> +               return -ENODEV;
> +
> +       if (!devm_request_mem_region(&pdev->dev, rx_res->start,
> +                                    resource_size(rx_res), "bf-tmfifo"))
> +               return -EBUSY;
> +
> +       if (!devm_request_mem_region(&pdev->dev, tx_res->start,
> +                                    resource_size(tx_res), "bf-tmfifo"))
> +               return -EBUSY;
> +
> +       fifo = devm_kzalloc(&pdev->dev, sizeof(*fifo), GFP_KERNEL);
> +       if (!fifo)
> +               return -ENOMEM;
> +
> +       fifo->pdev = pdev;
> +       platform_set_drvdata(pdev, fifo);
> +
> +       spin_lock_init(&fifo->spin_lock);
> +       INIT_WORK(&fifo->work, mlxbf_tmfifo_work_handler);
> +
> +       timer_setup(&fifo->timer, mlxbf_tmfifo_timer, 0);
> +
> +       for (i = 0; i < MLXBF_TM_IRQ_CNT; i++) {
> +               fifo->irq_info[i].index = i;
> +               fifo->irq_info[i].fifo = fifo;

> +               fifo->irq_info[i].irq = platform_get_irq(pdev, i);


> +               ret = devm_request_irq(&pdev->dev, fifo->irq_info[i].irq,
> +                                      mlxbf_tmfifo_irq_handler, 0,
> +                                      "tmfifo", &fifo->irq_info[i]);
> +               if (ret) {
> +                       dev_err(&pdev->dev, "devm_request_irq failed\n");
> +                       fifo->irq_info[i].irq = 0;
> +                       return ret;
> +               }
> +       }
> +

> +       fifo->rx_base = devm_ioremap(&pdev->dev, rx_res->start,
> +                                    resource_size(rx_res));
> +       if (!fifo->rx_base)
> +               return -ENOMEM;
> +
> +       fifo->tx_base = devm_ioremap(&pdev->dev, tx_res->start,
> +                                    resource_size(tx_res));
> +       if (!fifo->tx_base)
> +               return -ENOMEM;

Switch to devm_ioremap_resource().
However, I think you probably need memremap().

> +       mutex_init(&fifo->lock);

Isn't too late for initializing this one?


> +/* Device remove function. */
> +static int mlxbf_tmfifo_remove(struct platform_device *pdev)
> +{
> +       struct mlxbf_tmfifo *fifo = platform_get_drvdata(pdev);
> +

> +       if (fifo)

How is it possible to be not true?

> +               mlxbf_tmfifo_cleanup(fifo);
> +

> +       platform_set_drvdata(pdev, NULL);

Redundant.

> +
> +       return 0;
> +}

> +MODULE_LICENSE("GPL");

Is it correct?

-- 
With Best Regards,
Andy Shevchenko

^ permalink raw reply	[flat|nested] 30+ messages in thread

* RE: [PATCH v9] platform/mellanox: Add TmFifo driver for Mellanox BlueField Soc
  2019-02-13 18:11   ` Andy Shevchenko
@ 2019-02-13 18:34     ` Liming Sun
  2019-02-14 16:25     ` Liming Sun
  2019-02-28 15:51     ` Liming Sun
  2 siblings, 0 replies; 30+ messages in thread
From: Liming Sun @ 2019-02-13 18:34 UTC (permalink / raw)
  To: Andy Shevchenko
  Cc: David Woods, Andy Shevchenko, Darren Hart, Vadim Pasternak,
	Linux Kernel Mailing List, Platform Driver

Thanks Andy. We actually had some internal reviews as you mentioned.
I'll try to solve the comments and update the 'Reviewed-by' in next revision.

Regards,
Liming

> -----Original Message-----
> From: Andy Shevchenko <andy.shevchenko@gmail.com>
> Sent: Wednesday, February 13, 2019 1:11 PM
> To: Liming Sun <lsun@mellanox.com>
> Cc: David Woods <dwoods@mellanox.com>; Andy Shevchenko <andy@infradead.org>; Darren Hart <dvhart@infradead.org>; Vadim
> Pasternak <vadimp@mellanox.com>; Linux Kernel Mailing List <linux-kernel@vger.kernel.org>; Platform Driver <platform-driver-
> x86@vger.kernel.org>
> Subject: Re: [PATCH v9] platform/mellanox: Add TmFifo driver for Mellanox BlueField Soc
> 
> On Wed, Feb 13, 2019 at 3:27 PM Liming Sun <lsun@mellanox.com> wrote:
> >
> > This commit adds the TmFifo platform driver for Mellanox BlueField
> > Soc. TmFifo is a shared FIFO which enables external host machine
> > to exchange data with the SoC via USB or PCIe. The driver is based
> > on virtio framework and has console and network access enabled.
> 
> Thanks for an update, my comments below.
> 
> Again, to Mellanox: guys, please, establish internal mailing list for
> review and don't come with such quality of code.
> 
> Next time I would like to see Reviewed-by from Mellanox people I know,
> like Vadim or Leon.
> 
> > +config MLXBF_TMFIFO
> > +       tristate "Mellanox BlueField SoC TmFifo platform driver"
> 
> > +       depends on ARM64 && ACPI && VIRTIO_CONSOLE && VIRTIO_NET
> 
> Split this to three logical parts.
> 
> > +       help
> > +         Say y here to enable TmFifo support. The TmFifo driver provides
> > +          platform driver support for the TmFifo which supports console
> > +          and networking based on the virtio framework.
> 
> >  obj-$(CONFIG_MLXREG_HOTPLUG)   += mlxreg-hotplug.o
> >  obj-$(CONFIG_MLXREG_IO) += mlxreg-io.o
> > +obj-$(CONFIG_MLXBF_TMFIFO)     += mlxbf-tmfifo.o
> 
> I would suggest to keep it sorted.
> 
> > +#define MLXBF_TMFIFO_TX_DATA 0x0
> 
> I suggest to use same fixed format for offsets.
> Here, for example, 0x00 would be better.
> 
> > +#define MLXBF_TMFIFO_TX_STS__COUNT_RMASK 0x1ff
> > +#define MLXBF_TMFIFO_TX_STS__COUNT_MASK  0x1ff
> 
> #include <linux/bits.h>
> ...
> GENMASK()
> 
> > +#define MLXBF_TMFIFO_TX_CTL__LWM_RMASK 0xff
> > +#define MLXBF_TMFIFO_TX_CTL__LWM_MASK  0xff
> 
> > +#define MLXBF_TMFIFO_TX_CTL__HWM_RMASK 0xff
> > +#define MLXBF_TMFIFO_TX_CTL__HWM_MASK  0xff00
> 
> > +#define MLXBF_TMFIFO_TX_CTL__MAX_ENTRIES_RMASK 0x1ff
> > +#define MLXBF_TMFIFO_TX_CTL__MAX_ENTRIES_MASK  0x1ff00000000ULL
> 
> GENMASK() / GENMASK_ULL()
> 
> > +#define MLXBF_TMFIFO_RX_STS__COUNT_RMASK 0x1ff
> > +#define MLXBF_TMFIFO_RX_STS__COUNT_MASK  0x1ff
> 
> GENMASK()
> 
> > +#define MLXBF_TMFIFO_RX_CTL__LWM_RMASK 0xff
> > +#define MLXBF_TMFIFO_RX_CTL__LWM_MASK  0xff
> 
> > +#define MLXBF_TMFIFO_RX_CTL__HWM_RMASK 0xff
> > +#define MLXBF_TMFIFO_RX_CTL__HWM_MASK  0xff00
> 
> > +#define MLXBF_TMFIFO_RX_CTL__MAX_ENTRIES_RMASK 0x1ff
> > +#define MLXBF_TMFIFO_RX_CTL__MAX_ENTRIES_MASK  0x1ff00000000ULL
> 
> Ditto.
> 
> > +#include <linux/acpi.h>
> > +#include <linux/byteorder/generic.h>
> > +#include <linux/bitfield.h>
> > +#include <linux/cache.h>
> > +#include <linux/device.h>
> > +#include <linux/dma-mapping.h>
> > +#include <linux/efi.h>
> > +#include <linux/io.h>
> > +#include <linux/interrupt.h>
> > +#include <linux/irq.h>
> > +#include <linux/kernel.h>
> > +#include <linux/math64.h>
> > +#include <linux/module.h>
> > +#include <linux/moduleparam.h>
> > +#include <linux/mutex.h>
> > +#include <linux/platform_device.h>
> > +#include <linux/resource.h>
> > +#include <linux/slab.h>
> > +#include <linux/types.h>
> > +#include <linux/version.h>
> > +#include <linux/virtio.h>
> > +#include <linux/virtio_config.h>
> > +#include <linux/virtio_console.h>
> > +#include <linux/virtio_ids.h>
> > +#include <linux/virtio_net.h>
> > +#include <linux/virtio_ring.h>
> 
> Do you need all of them?
> 
> > +#define MLXBF_TMFIFO_VRING_SIZE                        1024
> 
> SZ_1K ?
> 
> > +/* Console Tx buffer size. */
> > +#define MLXBF_TMFIFO_CONS_TX_BUF_SIZE          (32 * 1024)
> 
> SZ_32K ?
> 
> > +/* House-keeping timer interval. */
> > +static int mlxbf_tmfifo_timer_interval = HZ / 10;
> 
> > +/* Global lock. */
> 
> Noise. Either explain what it protects, or remove.
> 
> > +static DEFINE_MUTEX(mlxbf_tmfifo_lock);
> 
> > +/* Struct declaration. */
> 
> Noise.
> 
> > +/* Structure to maintain the ring state. */
> > +struct mlxbf_tmfifo_vring {
> > +       void *va;                       /* virtual address */
> > +       dma_addr_t dma;                 /* dma address */
> > +       struct virtqueue *vq;           /* virtqueue pointer */
> > +       struct vring_desc *desc;        /* current desc */
> > +       struct vring_desc *desc_head;   /* current desc head */
> > +       int cur_len;                    /* processed len in current desc */
> > +       int rem_len;                    /* remaining length to be processed */
> > +       int size;                       /* vring size */
> > +       int align;                      /* vring alignment */
> > +       int id;                         /* vring id */
> > +       int vdev_id;                    /* TMFIFO_VDEV_xxx */
> > +       u32 pkt_len;                    /* packet total length */
> > +       u16 next_avail;                 /* next avail desc id */
> > +       struct mlxbf_tmfifo *fifo;      /* pointer back to the tmfifo */
> > +};
> 
> Perhaps kernel-doc?
> 
> > +/* Interrupt types. */
> > +enum {
> > +       MLXBF_TM_RX_LWM_IRQ,            /* Rx low water mark irq */
> > +       MLXBF_TM_RX_HWM_IRQ,            /* Rx high water mark irq */
> > +       MLXBF_TM_TX_LWM_IRQ,            /* Tx low water mark irq */
> > +       MLXBF_TM_TX_HWM_IRQ,            /* Tx high water mark irq */
> > +       MLXBF_TM_IRQ_CNT
> 
> CNT...
> 
> > +};
> > +
> > +/* Ring types (Rx & Tx). */
> > +enum {
> > +       MLXBF_TMFIFO_VRING_RX,          /* Rx ring */
> > +       MLXBF_TMFIFO_VRING_TX,          /* Tx ring */
> > +       MLXBF_TMFIFO_VRING_NUM
> 
> ...NUM
> 
> Perhaps one style for max numbers?
> 
> > +};
> 
> > +
> > +/* Structure for the virtual device. */
> > +struct mlxbf_tmfifo_vdev {
> > +       struct virtio_device vdev;      /* virtual device */
> > +       u8 status;
> > +       u64 features;
> > +       union {                         /* virtio config space */
> > +               struct virtio_console_config cons;
> > +               struct virtio_net_config net;
> > +       } config;
> 
> Describe, which field allows to distinguish what type of the data is in a union.
> 
> > +       struct mlxbf_tmfifo_vring vrings[MLXBF_TMFIFO_VRING_NUM];
> > +       u8 *tx_buf;                     /* tx buffer */
> > +       u32 tx_head;                    /* tx buffer head */
> > +       u32 tx_tail;                    /* tx buffer tail */
> > +};
> 
> kernel-doc?
> 
> > +/* Structure of the interrupt information. */
> > +struct mlxbf_tmfifo_irq_info {
> > +       struct mlxbf_tmfifo *fifo;      /* tmfifo structure */
> > +       int irq;                        /* interrupt number */
> > +       int index;                      /* array index */
> > +};
> 
> Ditto.
> 
> > +
> > +/* Structure of the TmFifo information. */
> > +struct mlxbf_tmfifo {
> > +       struct mlxbf_tmfifo_vdev *vdev[MLXBF_TMFIFO_VDEV_MAX]; /* devices */
> > +       struct platform_device *pdev;   /* platform device */
> > +       struct mutex lock;              /* fifo lock */
> > +       void __iomem *rx_base;          /* mapped register base */
> > +       void __iomem *tx_base;          /* mapped register base */
> > +       int tx_fifo_size;               /* number of entries of the Tx FIFO */
> > +       int rx_fifo_size;               /* number of entries of the Rx FIFO */
> > +       unsigned long pend_events;      /* pending bits for deferred process */
> > +       struct mlxbf_tmfifo_irq_info irq_info[MLXBF_TM_IRQ_CNT]; /* irq info */
> > +       struct work_struct work;        /* work struct for deferred process */
> > +       struct timer_list timer;        /* keepalive timer */
> > +       struct mlxbf_tmfifo_vring *vring[2];    /* current Tx/Rx ring */
> > +       bool is_ready;                  /* ready flag */
> > +       spinlock_t spin_lock;           /* spin lock */
> > +};
> 
> Ditto.
> 
> > +/* Use a union struction for 64-bit little/big endian. */
> 
> What does this mean?
> 
> > +union mlxbf_tmfifo_data_64bit {
> > +       u64 data;
> > +       __le64 data_le;
> > +};
> > +
> > +/* Message header used to demux data in the TmFifo. */
> > +union mlxbf_tmfifo_msg_hdr {
> > +       struct {
> > +               u8 type;                /* message type */
> > +               __be16 len;             /* payload length */
> > +               u8 unused[5];           /* reserved, set to 0 */
> > +       } __packed;
> 
> It's already packed. No?
> 
> > +       union mlxbf_tmfifo_data_64bit u;        /* 64-bit data */
> > +};
> 
> > +/* MTU setting of the virtio-net interface. */
> > +#define MLXBF_TMFIFO_NET_MTU           1500
> 
> Don't we have this globally defined?
> 
> > +/* Supported virtio-net features. */
> > +#define MLXBF_TMFIFO_NET_FEATURES      ((1UL << VIRTIO_NET_F_MTU) | \
> > +                                        (1UL << VIRTIO_NET_F_STATUS) | \
> > +                                        (1UL << VIRTIO_NET_F_MAC))
> 
> BIT_UL() ?
> 
> > +/* Function declarations. */
> 
> Noise.
> 
> > +static int mlxbf_tmfifo_remove(struct platform_device *pdev);
> 
> Why do you need forward declaration for this?
> 
> > +/* Return the consumed Tx buffer space. */
> > +static int mlxbf_tmfifo_vdev_tx_buf_len(struct mlxbf_tmfifo_vdev *vdev)
> > +{
> > +       return ((vdev->tx_tail >= vdev->tx_head) ?
> > +               (vdev->tx_tail - vdev->tx_head) :
> > +               (MLXBF_TMFIFO_CONS_TX_BUF_SIZE - vdev->tx_head +
> > +                vdev->tx_tail));
> 
> Split this for better reading.
> 
> > +}
> > +
> > +/* Return the available Tx buffer space. */
> > +static int mlxbf_tmfifo_vdev_tx_buf_avail(struct mlxbf_tmfifo_vdev *vdev)
> > +{
> > +       return (MLXBF_TMFIFO_CONS_TX_BUF_RSV_SIZE -
> > +               mlxbf_tmfifo_vdev_tx_buf_len(vdev));
> 
> Redundant parens.
> Moreover, you might consider temporary variable for better reading.
> 
> > +}
> > +
> > +/* Update Rx/Tx buffer index pointer. */
> > +static void mlxbf_tmfifo_vdev_tx_buf_index_inc(u32 *index, u32 len)
> > +{
> > +       *index += len;
> > +       if (*index >= MLXBF_TMFIFO_CONS_TX_BUF_SIZE)
> > +               *index -= MLXBF_TMFIFO_CONS_TX_BUF_SIZE;
> > +}
> > +
> > +/* Allocate vrings for the fifo. */
> > +static int mlxbf_tmfifo_alloc_vrings(struct mlxbf_tmfifo *fifo,
> > +                                    struct mlxbf_tmfifo_vdev *tm_vdev,
> > +                                    int vdev_id)
> > +{
> > +       struct mlxbf_tmfifo_vring *vring;
> > +       dma_addr_t dma;
> > +       int i, size;
> > +       void *va;
> > +
> > +       for (i = 0; i < ARRAY_SIZE(tm_vdev->vrings); i++) {
> > +               vring = &tm_vdev->vrings[i];
> > +               vring->fifo = fifo;
> > +               vring->size = MLXBF_TMFIFO_VRING_SIZE;
> > +               vring->align = SMP_CACHE_BYTES;
> > +               vring->id = i;
> > +               vring->vdev_id = vdev_id;
> > +
> 
> > +               size = PAGE_ALIGN(vring_size(vring->size, vring->align));
> 
> Why do you need this?
> dma_alloc_coherent() allocates memory on page granularity anyway.
> 
> > +               va = dma_alloc_coherent(tm_vdev->vdev.dev.parent, size, &dma,
> > +                                       GFP_KERNEL);
> > +               if (!va) {
> 
> > +                       dev_err(tm_vdev->vdev.dev.parent,
> 
> Would be much easy if you have temporary variable for this device.
> 
> > +                               "dma_alloc_coherent failed\n");
> > +                       return -ENOMEM;
> > +               }
> > +
> > +               vring->va = va;
> > +               vring->dma = dma;
> > +       }
> > +
> > +       return 0;
> > +}
> 
> > +/* Interrupt handler. */
> > +static irqreturn_t mlxbf_tmfifo_irq_handler(int irq, void *arg)
> > +{
> > +       struct mlxbf_tmfifo_irq_info *irq_info;
> > +
> > +       irq_info = (struct mlxbf_tmfifo_irq_info *)arg;
> 
> Useless casting.
> Assignment can be done in definition block.
> 
> > +       if (irq_info->index < MLXBF_TM_IRQ_CNT &&
> > +           !test_and_set_bit(irq_info->index, &irq_info->fifo->pend_events))
> > +               schedule_work(&irq_info->fifo->work);
> > +
> > +       return IRQ_HANDLED;
> > +}
> > +
> > +/* Get the next packet descriptor from the vring. */
> > +static struct vring_desc *mlxbf_tmfifo_get_next_desc(struct virtqueue *vq)
> > +{
> > +       struct mlxbf_tmfifo_vring *vring;
> > +       unsigned int idx, head;
> > +       struct vring *vr;
> > +
> > +       vr = (struct vring *)virtqueue_get_vring(vq);
> 
> Return type is different? Is it safe to cast? Why?
> 
> > +       if (!vr)
> > +               return NULL;
> 
> + blank line
> 
> > +       vring = (struct mlxbf_tmfifo_vring *)vq->priv;
> 
> Do you need explicit casting?
> 
> > +       if (vring->next_avail == virtio16_to_cpu(vq->vdev, vr->avail->idx))
> > +               return NULL;
> 
> +blank line
> 
> > +       idx = vring->next_avail % vr->num;
> > +       head = virtio16_to_cpu(vq->vdev, vr->avail->ring[idx]);
> > +       if (WARN_ON(head >= vr->num))
> > +               return NULL;
> > +       vring->next_avail++;
> > +
> > +       return &vr->desc[head];
> > +}
> > +
> > +/* Release virtio descriptor. */
> > +static void mlxbf_tmfifo_release_desc(struct virtio_device *vdev,
> > +                                     struct vring *vr, struct vring_desc *desc,
> > +                                     u32 len)
> > +{
> > +       u16 idx, vr_idx;
> > +
> > +       vr_idx = virtio16_to_cpu(vdev, vr->used->idx);
> > +       idx = vr_idx % vr->num;
> > +       vr->used->ring[idx].id = cpu_to_virtio32(vdev, desc - vr->desc);
> > +       vr->used->ring[idx].len = cpu_to_virtio32(vdev, len);
> > +
> > +       /* Virtio could poll and check the 'idx' to decide
> > +        * whether the desc is done or not. Add a memory
> > +        * barrier here to make sure the update above completes
> > +        * before updating the idx.
> > +        */
> 
> Multi-line comment style is broken.
> 
> > +       mb();
> > +       vr->used->idx = cpu_to_virtio16(vdev, vr_idx + 1);
> > +}
> 
> > +/* House-keeping timer. */
> > +static void mlxbf_tmfifo_timer(struct timer_list *arg)
> > +{
> > +       struct mlxbf_tmfifo *fifo;
> 
> > +       fifo = container_of(arg, struct mlxbf_tmfifo, timer);
> 
> Can't be done in the definition block?
> 
> > +       /*
> > +        * Wake up the work handler to poll the Rx FIFO in case interrupt
> > +        * missing or any leftover bytes stuck in the FIFO.
> > +        */
> > +       test_and_set_bit(MLXBF_TM_RX_HWM_IRQ, &fifo->pend_events);
> 
> How do you utilize test results?
> 
> > +
> > +       /*
> > +        * Wake up Tx handler in case virtio has queued too many packets
> > +        * and are waiting for buffer return.
> > +        */
> > +       test_and_set_bit(MLXBF_TM_TX_LWM_IRQ, &fifo->pend_events);
> 
> Ditto.
> 
> > +
> > +       schedule_work(&fifo->work);
> > +
> > +       mod_timer(&fifo->timer, jiffies + mlxbf_tmfifo_timer_interval);
> > +}
> 
> > +       /* Adjust the size to available space. */
> > +       if (size + sizeof(hdr) > avail * sizeof(u64))
> > +               size = avail * sizeof(u64) - sizeof(hdr);
> 
> Can avail be 0?
> 
> > +       /* Write header. */
> > +       hdr.u.data = 0;
> > +       hdr.type = VIRTIO_ID_CONSOLE;
> > +       hdr.len = htons(size);
> > +       hdr.u.data_le = cpu_to_le64(hdr.u.data);
> 
> > +       writeq(hdr.u.data, fifo->tx_base + MLXBF_TMFIFO_TX_DATA);
> 
> So, this one is not protected anyhow? Potential race condition?
> 
> > +
> > +       spin_lock_irqsave(&fifo->spin_lock, flags);
> > +
> > +       while (size > 0) {
> > +               addr = cons->tx_buf + cons->tx_head;
> > +
> > +               if (cons->tx_head + sizeof(u64) <=
> > +                   MLXBF_TMFIFO_CONS_TX_BUF_SIZE) {
> > +                       memcpy(&data, addr, sizeof(u64));
> > +               } else {
> > +                       partial = MLXBF_TMFIFO_CONS_TX_BUF_SIZE - cons->tx_head;
> > +                       memcpy(&data, addr, partial);
> > +                       memcpy((u8 *)&data + partial, cons->tx_buf,
> > +                              sizeof(u64) - partial);
> > +               }
> > +               writeq(data, fifo->tx_base + MLXBF_TMFIFO_TX_DATA);
> > +
> > +               if (size >= sizeof(u64)) {
> > +                       mlxbf_tmfifo_vdev_tx_buf_index_inc(&cons->tx_head,
> > +                                                          sizeof(u64));
> > +                       size -= sizeof(u64);
> > +               } else {
> > +                       mlxbf_tmfifo_vdev_tx_buf_index_inc(&cons->tx_head,
> > +                                                          size);
> > +                       size = 0;
> > +               }
> > +       }
> > +
> > +       spin_unlock_irqrestore(&fifo->spin_lock, flags);
> > +}
> 
> > +       /* Rx/Tx one word (8 bytes) if not done. */
> > +       if (vring->cur_len != len)
> > +               mlxbf_tmfifo_rxtx_word(fifo, vdev, vring, desc, is_rx, avail,
> > +                                      len);
> 
> In such case better to keep it in one line.
> 
> > +/* Get the array of feature bits for this device. */
> > +static u64 mlxbf_tmfifo_virtio_get_features(struct virtio_device *vdev)
> > +{
> > +       struct mlxbf_tmfifo_vdev *tm_vdev;
> > +
> > +       tm_vdev = container_of(vdev, struct mlxbf_tmfifo_vdev, vdev);
> > +       return tm_vdev->features;
> > +}
> > +
> > +/* Confirm device features to use. */
> > +static int mlxbf_tmfifo_virtio_finalize_features(struct virtio_device *vdev)
> > +{
> > +       struct mlxbf_tmfifo_vdev *tm_vdev;
> > +
> 
> > +       tm_vdev = container_of(vdev, struct mlxbf_tmfifo_vdev, vdev);
> 
> This is candidate to be a macro
> 
> #define mlxbt_vdev_to_tmfifo(...) ...
> 
> > +       tm_vdev->features = vdev->features;
> > +
> > +       return 0;
> > +}
> 
> > +/* Create vdev type in a tmfifo. */
> > +static int mlxbf_tmfifo_create_vdev(struct device *dev,
> > +                                   struct mlxbf_tmfifo *fifo,
> > +                                   int vdev_id, u64 features,
> > +                                   void *config, u32 size)
> > +{
> > +       struct mlxbf_tmfifo_vdev *tm_vdev;
> > +       int ret = 0;
> > +
> > +       mutex_lock(&fifo->lock);
> > +
> > +       tm_vdev = fifo->vdev[vdev_id];
> > +       if (tm_vdev) {
> > +               dev_err(dev, "vdev %d already exists\n", vdev_id);
> > +               ret = -EEXIST;
> > +               goto fail;
> > +       }
> > +
> > +       tm_vdev = devm_kzalloc(dev, sizeof(*tm_vdev), GFP_KERNEL);
> > +       if (!tm_vdev) {
> > +               ret = -ENOMEM;
> > +               goto fail;
> > +       }
> > +
> > +       tm_vdev->vdev.id.device = vdev_id;
> > +       tm_vdev->vdev.config = &mlxbf_tmfifo_virtio_config_ops;
> > +       tm_vdev->vdev.dev.parent = &fifo->pdev->dev;
> > +       tm_vdev->features = features;
> > +       if (config)
> > +               memcpy(&tm_vdev->config, config, size);
> > +       if (mlxbf_tmfifo_alloc_vrings(fifo, tm_vdev, vdev_id)) {
> > +               dev_err(dev, "unable to allocate vring\n");
> > +               ret = -ENOMEM;
> > +               goto fail;
> > +       }
> > +       if (vdev_id == VIRTIO_ID_CONSOLE)
> 
> > +               tm_vdev->tx_buf = devm_kmalloc(dev,
> > +                                              MLXBF_TMFIFO_CONS_TX_BUF_SIZE,
> > +                                              GFP_KERNEL);
> 
> Are you sure devm_ suits here?
> 
> > +       fifo->vdev[vdev_id] = tm_vdev;
> > +
> > +       /* Register the virtio device. */
> > +       ret = register_virtio_device(&tm_vdev->vdev);
> > +       if (ret) {
> > +               dev_err(&fifo->pdev->dev, "register_virtio_device failed\n");
> > +               goto register_fail;
> > +       }
> > +
> > +       mutex_unlock(&fifo->lock);
> > +       return 0;
> > +
> > +register_fail:
> > +       mlxbf_tmfifo_free_vrings(fifo, vdev_id);
> > +       fifo->vdev[vdev_id] = NULL;
> > +fail:
> > +       mutex_unlock(&fifo->lock);
> > +       return ret;
> > +}
> 
> > +/* Read the configured network MAC address from efi variable. */
> > +static void mlxbf_tmfifo_get_cfg_mac(u8 *mac)
> > +{
> > +       efi_guid_t guid = EFI_GLOBAL_VARIABLE_GUID;
> > +       efi_status_t status;
> > +       unsigned long size;
> > +       u8 buf[6];
> > +
> > +       size = sizeof(buf);
> > +       status = efi.get_variable(mlxbf_tmfifo_efi_name, &guid, NULL, &size,
> > +                                 buf);
> > +       if (status == EFI_SUCCESS && size == sizeof(buf))
> > +               memcpy(mac, buf, sizeof(buf));
> > +}
> 
> Shouldn't be rather helper in EFI lib in kernel?
> 
> > +/* Probe the TMFIFO. */
> > +static int mlxbf_tmfifo_probe(struct platform_device *pdev)
> > +{
> > +       struct virtio_net_config net_config;
> > +       struct resource *rx_res, *tx_res;
> > +       struct mlxbf_tmfifo *fifo;
> > +       int i, ret;
> > +
> > +       /* Get the resource of the Rx FIFO. */
> > +       rx_res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
> > +       if (!rx_res)
> > +               return -ENODEV;
> > +
> > +       /* Get the resource of the Tx FIFO. */
> > +       tx_res = platform_get_resource(pdev, IORESOURCE_MEM, 1);
> > +       if (!tx_res)
> > +               return -ENODEV;
> > +
> > +       if (!devm_request_mem_region(&pdev->dev, rx_res->start,
> > +                                    resource_size(rx_res), "bf-tmfifo"))
> > +               return -EBUSY;
> > +
> > +       if (!devm_request_mem_region(&pdev->dev, tx_res->start,
> > +                                    resource_size(tx_res), "bf-tmfifo"))
> > +               return -EBUSY;
> > +
> > +       fifo = devm_kzalloc(&pdev->dev, sizeof(*fifo), GFP_KERNEL);
> > +       if (!fifo)
> > +               return -ENOMEM;
> > +
> > +       fifo->pdev = pdev;
> > +       platform_set_drvdata(pdev, fifo);
> > +
> > +       spin_lock_init(&fifo->spin_lock);
> > +       INIT_WORK(&fifo->work, mlxbf_tmfifo_work_handler);
> > +
> > +       timer_setup(&fifo->timer, mlxbf_tmfifo_timer, 0);
> > +
> > +       for (i = 0; i < MLXBF_TM_IRQ_CNT; i++) {
> > +               fifo->irq_info[i].index = i;
> > +               fifo->irq_info[i].fifo = fifo;
> 
> > +               fifo->irq_info[i].irq = platform_get_irq(pdev, i);
> 
> 
> > +               ret = devm_request_irq(&pdev->dev, fifo->irq_info[i].irq,
> > +                                      mlxbf_tmfifo_irq_handler, 0,
> > +                                      "tmfifo", &fifo->irq_info[i]);
> > +               if (ret) {
> > +                       dev_err(&pdev->dev, "devm_request_irq failed\n");
> > +                       fifo->irq_info[i].irq = 0;
> > +                       return ret;
> > +               }
> > +       }
> > +
> 
> > +       fifo->rx_base = devm_ioremap(&pdev->dev, rx_res->start,
> > +                                    resource_size(rx_res));
> > +       if (!fifo->rx_base)
> > +               return -ENOMEM;
> > +
> > +       fifo->tx_base = devm_ioremap(&pdev->dev, tx_res->start,
> > +                                    resource_size(tx_res));
> > +       if (!fifo->tx_base)
> > +               return -ENOMEM;
> 
> Switch to devm_ioremap_resource().
> However, I think you probably need memremap().
> 
> > +       mutex_init(&fifo->lock);
> 
> Isn't too late for initializing this one?
> 
> 
> > +/* Device remove function. */
> > +static int mlxbf_tmfifo_remove(struct platform_device *pdev)
> > +{
> > +       struct mlxbf_tmfifo *fifo = platform_get_drvdata(pdev);
> > +
> 
> > +       if (fifo)
> 
> How is it possible to be not true?
> 
> > +               mlxbf_tmfifo_cleanup(fifo);
> > +
> 
> > +       platform_set_drvdata(pdev, NULL);
> 
> Redundant.
> 
> > +
> > +       return 0;
> > +}
> 
> > +MODULE_LICENSE("GPL");
> 
> Is it correct?
> 
> --
> With Best Regards,
> Andy Shevchenko

^ permalink raw reply	[flat|nested] 30+ messages in thread

* RE: [PATCH v9] platform/mellanox: Add TmFifo driver for Mellanox BlueField Soc
  2019-02-13 18:11   ` Andy Shevchenko
  2019-02-13 18:34     ` Liming Sun
@ 2019-02-14 16:25     ` Liming Sun
  2019-02-28 15:51     ` Liming Sun
  2 siblings, 0 replies; 30+ messages in thread
From: Liming Sun @ 2019-02-14 16:25 UTC (permalink / raw)
  To: Andy Shevchenko
  Cc: David Woods, Andy Shevchenko, Darren Hart, Vadim Pasternak,
	Linux Kernel Mailing List, Platform Driver

Thanks Andy. Please see my response and questions on some of the comments below.

Regards,
Liming

> -----Original Message-----
> From: Andy Shevchenko <andy.shevchenko@gmail.com>
> Sent: Wednesday, February 13, 2019 1:11 PM
> To: Liming Sun <lsun@mellanox.com>
> Cc: David Woods <dwoods@mellanox.com>; Andy Shevchenko <andy@infradead.org>; Darren Hart <dvhart@infradead.org>; Vadim
> Pasternak <vadimp@mellanox.com>; Linux Kernel Mailing List <linux-kernel@vger.kernel.org>; Platform Driver <platform-driver-
> x86@vger.kernel.org>
> Subject: Re: [PATCH v9] platform/mellanox: Add TmFifo driver for Mellanox BlueField Soc
> 
> On Wed, Feb 13, 2019 at 3:27 PM Liming Sun <lsun@mellanox.com> wrote:
> >
> ...
> 
> > +/* Use a union struction for 64-bit little/big endian. */
> 
> What does this mean?
> 
> > +union mlxbf_tmfifo_data_64bit {
> > +       u64 data;
> > +       __le64 data_le;
> > +};

The purpose is to send 8 bytes into the FIFO without data casting in writeq().

Below is the example with the cast.

u64 data = 0x1234;
__le64 data_le;
data_le = cpu_to_le64(data)
writeq(*(u64 *)&data_le, fifo->tx_base + MLXBF_TMFIFO_TX_DATA);

Below is the alternative trying to use union to avoid the cast.

mlxbf_tmfifo_data_64bit u;
u.data = 0x1234;
u. data_le = cpu_to_le64(u.data);
writeq(u.data_le, fifo->tx_base + MLXBF_TMFIFO_TX_DATA);

Which way might be better or any other suggestions?

> > +
> > +/* Message header used to demux data in the TmFifo. */
> > +union mlxbf_tmfifo_msg_hdr {
> > +       struct {
> > +               u8 type;                /* message type */
> > +               __be16 len;             /* payload length */
> > +               u8 unused[5];           /* reserved, set to 0 */
> > +       } __packed;
> 
> It's already packed. No?

The '__packed' is needed here. Without it the compiler will make the
structure size exceeding 8 bytes which is not desired.

>...
> > +       if (vdev_id == VIRTIO_ID_CONSOLE)
> 
> > +               tm_vdev->tx_buf = devm_kmalloc(dev,
> > +                                              MLXBF_TMFIFO_CONS_TX_BUF_SIZE,
> > +                                              GFP_KERNEL);
> 
> Are you sure devm_ suits here?

The 'tx_buf' are normal buffer to hold the console output. 
It seems ok to use devm_kmalloc so it could automatically freed
on driver detach. Please correct me if I am wrong.

>>...
> > +
> > +       fifo->tx_base = devm_ioremap(&pdev->dev, tx_res->start,
> > +                                    resource_size(tx_res));
> > +       if (!fifo->tx_base)
> > +               return -ENOMEM;
> 
> Switch to devm_ioremap_resource().
> However, I think you probably need memremap().

This is device registers accessed by arm64 core.
In arm64/include/asm/io.h, several apis are defined the same.

#define ioremap(addr, size)		__ioremap((addr), (size), __pgprot(PROT_DEVICE_nGnRE))
#define ioremap_nocache(addr, size)	__ioremap((addr), (size), __pgprot(PROT_DEVICE_nGnRE))
#define ioremap_wt(addr, size)		__ioremap((addr), (size), __pgprot(PROT_DEVICE_nGnRE))

How about using devm_ioremap_nocache()?
It could take advantage of the devm_xx() api.

>...
> Is it correct?
> 
> --
> With Best Regards,
> Andy Shevchenko

^ permalink raw reply	[flat|nested] 30+ messages in thread

* RE: [PATCH v9] platform/mellanox: Add TmFifo driver for Mellanox BlueField Soc
  2019-02-13 18:11   ` Andy Shevchenko
  2019-02-13 18:34     ` Liming Sun
  2019-02-14 16:25     ` Liming Sun
@ 2019-02-28 15:51     ` Liming Sun
  2 siblings, 0 replies; 30+ messages in thread
From: Liming Sun @ 2019-02-28 15:51 UTC (permalink / raw)
  To: Andy Shevchenko
  Cc: David Woods, Andy Shevchenko, Darren Hart, Vadim Pasternak,
	Linux Kernel Mailing List, Platform Driver

Thanks Andy for the comments. Please see the responses below.
I'll also post the v10 patch after this email.

Regards,
Liming

> -----Original Message-----
> From: Andy Shevchenko <andy.shevchenko@gmail.com>
> Sent: Wednesday, February 13, 2019 1:11 PM
> To: Liming Sun <lsun@mellanox.com>
> Cc: David Woods <dwoods@mellanox.com>; Andy Shevchenko <andy@infradead.org>; Darren Hart <dvhart@infradead.org>; Vadim
> Pasternak <vadimp@mellanox.com>; Linux Kernel Mailing List <linux-kernel@vger.kernel.org>; Platform Driver <platform-driver-
> x86@vger.kernel.org>
> Subject: Re: [PATCH v9] platform/mellanox: Add TmFifo driver for Mellanox BlueField Soc
> 
> On Wed, Feb 13, 2019 at 3:27 PM Liming Sun <lsun@mellanox.com> wrote:
> >
> > This commit adds the TmFifo platform driver for Mellanox BlueField
> > Soc. TmFifo is a shared FIFO which enables external host machine
> > to exchange data with the SoC via USB or PCIe. The driver is based
> > on virtio framework and has console and network access enabled.
> 
> Thanks for an update, my comments below.
> 
> Again, to Mellanox: guys, please, establish internal mailing list for
> review and don't come with such quality of code.

Yes, the patch went through internal review. I updated the 
Reviewed-by section of the commit message.

> 
> Next time I would like to see Reviewed-by from Mellanox people I know,
> like Vadim or Leon.
> 
> > +config MLXBF_TMFIFO
> > +       tristate "Mellanox BlueField SoC TmFifo platform driver"
> 
> > +       depends on ARM64 && ACPI && VIRTIO_CONSOLE && VIRTIO_NET
> 
> Split this to three logical parts.

Updated in v10.

> 
> > +       help
> > +         Say y here to enable TmFifo support. The TmFifo driver provides
> > +          platform driver support for the TmFifo which supports console
> > +          and networking based on the virtio framework.
> 
> >  obj-$(CONFIG_MLXREG_HOTPLUG)   += mlxreg-hotplug.o
> >  obj-$(CONFIG_MLXREG_IO) += mlxreg-io.o
> > +obj-$(CONFIG_MLXBF_TMFIFO)     += mlxbf-tmfifo.o
> 
> I would suggest to keep it sorted.

Updated in v10.

> 
> > +#define MLXBF_TMFIFO_TX_DATA 0x0
> 
> I suggest to use same fixed format for offsets.
> Here, for example, 0x00 would be better.
> 
> > +#define MLXBF_TMFIFO_TX_STS__COUNT_RMASK 0x1ff
> > +#define MLXBF_TMFIFO_TX_STS__COUNT_MASK  0x1ff
> 
> #include <linux/bits.h>
> ...
> GENMASK()

Updated in v10.

> 
> > +#define MLXBF_TMFIFO_TX_CTL__LWM_RMASK 0xff
> > +#define MLXBF_TMFIFO_TX_CTL__LWM_MASK  0xff
> 
> > +#define MLXBF_TMFIFO_TX_CTL__HWM_RMASK 0xff
> > +#define MLXBF_TMFIFO_TX_CTL__HWM_MASK  0xff00
> 
> > +#define MLXBF_TMFIFO_TX_CTL__MAX_ENTRIES_RMASK 0x1ff
> > +#define MLXBF_TMFIFO_TX_CTL__MAX_ENTRIES_MASK  0x1ff00000000ULL
> 
> GENMASK() / GENMASK_ULL()

Updated in v10.

> 
> > +#define MLXBF_TMFIFO_RX_STS__COUNT_RMASK 0x1ff
> > +#define MLXBF_TMFIFO_RX_STS__COUNT_MASK  0x1ff
> 
> GENMASK()

Updated in v10.

> 
> > +#define MLXBF_TMFIFO_RX_CTL__LWM_RMASK 0xff
> > +#define MLXBF_TMFIFO_RX_CTL__LWM_MASK  0xff
> 
> > +#define MLXBF_TMFIFO_RX_CTL__HWM_RMASK 0xff
> > +#define MLXBF_TMFIFO_RX_CTL__HWM_MASK  0xff00
> 
> > +#define MLXBF_TMFIFO_RX_CTL__MAX_ENTRIES_RMASK 0x1ff
> > +#define MLXBF_TMFIFO_RX_CTL__MAX_ENTRIES_MASK  0x1ff00000000ULL
> 
> Ditto.

Updated in v10.

> 
> > +#include <linux/acpi.h>
> > +#include <linux/byteorder/generic.h>
> > +#include <linux/bitfield.h>
> > +#include <linux/cache.h>
> > +#include <linux/device.h>
> > +#include <linux/dma-mapping.h>
> > +#include <linux/efi.h>
> > +#include <linux/io.h>
> > +#include <linux/interrupt.h>
> > +#include <linux/irq.h>
> > +#include <linux/kernel.h>
> > +#include <linux/math64.h>
> > +#include <linux/module.h>
> > +#include <linux/moduleparam.h>
> > +#include <linux/mutex.h>
> > +#include <linux/platform_device.h>
> > +#include <linux/resource.h>
> > +#include <linux/slab.h>
> > +#include <linux/types.h>
> > +#include <linux/version.h>
> > +#include <linux/virtio.h>
> > +#include <linux/virtio_config.h>
> > +#include <linux/virtio_console.h>
> > +#include <linux/virtio_ids.h>
> > +#include <linux/virtio_net.h>
> > +#include <linux/virtio_ring.h>
> 
> Do you need all of them?

Cleaned up quite a few and updated in v10.

> 
> > +#define MLXBF_TMFIFO_VRING_SIZE                        1024
> 
> SZ_1K ?

Updated in v10.

> 
> > +/* Console Tx buffer size. */
> > +#define MLXBF_TMFIFO_CONS_TX_BUF_SIZE          (32 * 1024)
> 
> SZ_32K ?

Updated in v10.

> 
> > +/* House-keeping timer interval. */
> > +static int mlxbf_tmfifo_timer_interval = HZ / 10;
> 
> > +/* Global lock. */
> 
> Noise. Either explain what it protects, or remove.

Removed in v10.

> 
> > +static DEFINE_MUTEX(mlxbf_tmfifo_lock);
> 
> > +/* Struct declaration. */
> 
> Noise.

Removed in v10.

> 
> > +/* Structure to maintain the ring state. */
> > +struct mlxbf_tmfifo_vring {
> > +       void *va;                       /* virtual address */
> > +       dma_addr_t dma;                 /* dma address */
> > +       struct virtqueue *vq;           /* virtqueue pointer */
> > +       struct vring_desc *desc;        /* current desc */
> > +       struct vring_desc *desc_head;   /* current desc head */
> > +       int cur_len;                    /* processed len in current desc */
> > +       int rem_len;                    /* remaining length to be processed */
> > +       int size;                       /* vring size */
> > +       int align;                      /* vring alignment */
> > +       int id;                         /* vring id */
> > +       int vdev_id;                    /* TMFIFO_VDEV_xxx */
> > +       u32 pkt_len;                    /* packet total length */
> > +       u16 next_avail;                 /* next avail desc id */
> > +       struct mlxbf_tmfifo *fifo;      /* pointer back to the tmfifo */
> > +};
> 
> Perhaps kernel-doc?

Updated in v10.

> 
> > +/* Interrupt types. */
> > +enum {
> > +       MLXBF_TM_RX_LWM_IRQ,            /* Rx low water mark irq */
> > +       MLXBF_TM_RX_HWM_IRQ,            /* Rx high water mark irq */
> > +       MLXBF_TM_TX_LWM_IRQ,            /* Tx low water mark irq */
> > +       MLXBF_TM_TX_HWM_IRQ,            /* Tx high water mark irq */
> > +       MLXBF_TM_IRQ_CNT
> 
> CNT...
> 
> > +};
> > +
> > +/* Ring types (Rx & Tx). */
> > +enum {
> > +       MLXBF_TMFIFO_VRING_RX,          /* Rx ring */
> > +       MLXBF_TMFIFO_VRING_TX,          /* Tx ring */
> > +       MLXBF_TMFIFO_VRING_NUM
> 
> ...NUM
> 
> Perhaps one style for max numbers?

Updated in v10.

> 
> > +};
> 
> > +
> > +/* Structure for the virtual device. */
> > +struct mlxbf_tmfifo_vdev {
> > +       struct virtio_device vdev;      /* virtual device */
> > +       u8 status;
> > +       u64 features;
> > +       union {                         /* virtio config space */
> > +               struct virtio_console_config cons;
> > +               struct virtio_net_config net;
> > +       } config;
> 
> Describe, which field allows to distinguish what type of the data is in a union.

Added comments in v10.

> 
> > +       struct mlxbf_tmfifo_vring vrings[MLXBF_TMFIFO_VRING_NUM];
> > +       u8 *tx_buf;                     /* tx buffer */
> > +       u32 tx_head;                    /* tx buffer head */
> > +       u32 tx_tail;                    /* tx buffer tail */
> > +};
> 
> kernel-doc?

Updated in v10

> 
> > +/* Structure of the interrupt information. */
> > +struct mlxbf_tmfifo_irq_info {
> > +       struct mlxbf_tmfifo *fifo;      /* tmfifo structure */
> > +       int irq;                        /* interrupt number */
> > +       int index;                      /* array index */
> > +};
> 
> Ditto.

Updated in v10

> 
> > +
> > +/* Structure of the TmFifo information. */
> > +struct mlxbf_tmfifo {
> > +       struct mlxbf_tmfifo_vdev *vdev[MLXBF_TMFIFO_VDEV_MAX]; /* devices */
> > +       struct platform_device *pdev;   /* platform device */
> > +       struct mutex lock;              /* fifo lock */
> > +       void __iomem *rx_base;          /* mapped register base */
> > +       void __iomem *tx_base;          /* mapped register base */
> > +       int tx_fifo_size;               /* number of entries of the Tx FIFO */
> > +       int rx_fifo_size;               /* number of entries of the Rx FIFO */
> > +       unsigned long pend_events;      /* pending bits for deferred process */
> > +       struct mlxbf_tmfifo_irq_info irq_info[MLXBF_TM_IRQ_CNT]; /* irq info */
> > +       struct work_struct work;        /* work struct for deferred process */
> > +       struct timer_list timer;        /* keepalive timer */
> > +       struct mlxbf_tmfifo_vring *vring[2];    /* current Tx/Rx ring */
> > +       bool is_ready;                  /* ready flag */
> > +       spinlock_t spin_lock;           /* spin lock */
> > +};
> 
> Ditto.

Updated in v10

> 
> > +/* Use a union struction for 64-bit little/big endian. */
> 
> What does this mean?

Updated in v10 with the following comments to explain it.
/*
 * It's expected to send 64-bit little-endian value (__le64) into the TmFifo.
 * readq() and writeq() expect u64 instead. A union structure is used here
 * to workaround the explicit casting usage like writeq(*(u64 *)&data_le).
 */

> 
> > +union mlxbf_tmfifo_data_64bit {
> > +       u64 data;
> > +       __le64 data_le;
> > +};
> > +
> > +/* Message header used to demux data in the TmFifo. */
> > +union mlxbf_tmfifo_msg_hdr {
> > +       struct {
> > +               u8 type;                /* message type */
> > +               __be16 len;             /* payload length */
> > +               u8 unused[5];           /* reserved, set to 0 */
> > +       } __packed;
> 
> It's already packed. No?

It's not packed by default due to the 16-bit len. We need the '__packed'
to make sure the size of the structure is 8 bytes.

> 
> > +       union mlxbf_tmfifo_data_64bit u;        /* 64-bit data */
> > +};
> 
> > +/* MTU setting of the virtio-net interface. */
> > +#define MLXBF_TMFIFO_NET_MTU           1500
> 
> Don't we have this globally defined?

Updated in v10

> 
> > +/* Supported virtio-net features. */
> > +#define MLXBF_TMFIFO_NET_FEATURES      ((1UL << VIRTIO_NET_F_MTU) | \
> > +                                        (1UL << VIRTIO_NET_F_STATUS) | \
> > +                                        (1UL << VIRTIO_NET_F_MAC))
> 
> BIT_UL() ?

Updated in v10

> 
> > +/* Function declarations. */
> 
> Noise.

Removed in v10

> 
> > +static int mlxbf_tmfifo_remove(struct platform_device *pdev);
> 
> Why do you need forward declaration for this?

Removed in v10

> 
> > +/* Return the consumed Tx buffer space. */
> > +static int mlxbf_tmfifo_vdev_tx_buf_len(struct mlxbf_tmfifo_vdev *vdev)
> > +{
> > +       return ((vdev->tx_tail >= vdev->tx_head) ?
> > +               (vdev->tx_tail - vdev->tx_head) :
> > +               (MLXBF_TMFIFO_CONS_TX_BUF_SIZE - vdev->tx_head +
> > +                vdev->tx_tail));
> 
> Split this for better reading.

Updated in v10

> 
> > +}
> > +
> > +/* Return the available Tx buffer space. */
> > +static int mlxbf_tmfifo_vdev_tx_buf_avail(struct mlxbf_tmfifo_vdev *vdev)
> > +{
> > +       return (MLXBF_TMFIFO_CONS_TX_BUF_RSV_SIZE -
> > +               mlxbf_tmfifo_vdev_tx_buf_len(vdev));
> 
> Redundant parens.
> Moreover, you might consider temporary variable for better reading.

Updated in v10

> 
> > +}
> > +
> > +/* Update Rx/Tx buffer index pointer. */
> > +static void mlxbf_tmfifo_vdev_tx_buf_index_inc(u32 *index, u32 len)
> > +{
> > +       *index += len;
> > +       if (*index >= MLXBF_TMFIFO_CONS_TX_BUF_SIZE)
> > +               *index -= MLXBF_TMFIFO_CONS_TX_BUF_SIZE;
> > +}
> > +
> > +/* Allocate vrings for the fifo. */
> > +static int mlxbf_tmfifo_alloc_vrings(struct mlxbf_tmfifo *fifo,
> > +                                    struct mlxbf_tmfifo_vdev *tm_vdev,
> > +                                    int vdev_id)
> > +{
> > +       struct mlxbf_tmfifo_vring *vring;
> > +       dma_addr_t dma;
> > +       int i, size;
> > +       void *va;
> > +
> > +       for (i = 0; i < ARRAY_SIZE(tm_vdev->vrings); i++) {
> > +               vring = &tm_vdev->vrings[i];
> > +               vring->fifo = fifo;
> > +               vring->size = MLXBF_TMFIFO_VRING_SIZE;
> > +               vring->align = SMP_CACHE_BYTES;
> > +               vring->id = i;
> > +               vring->vdev_id = vdev_id;
> > +
> 
> > +               size = PAGE_ALIGN(vring_size(vring->size, vring->align));
> 
> Why do you need this?
> dma_alloc_coherent() allocates memory on page granularity anyway.

Updated in v10

> 
> > +               va = dma_alloc_coherent(tm_vdev->vdev.dev.parent, size, &dma,
> > +                                       GFP_KERNEL);
> > +               if (!va) {
> 
> > +                       dev_err(tm_vdev->vdev.dev.parent,
> 
> Would be much easy if you have temporary variable for this device.

Updated in v10

> 
> > +                               "dma_alloc_coherent failed\n");
> > +                       return -ENOMEM;
> > +               }
> > +
> > +               vring->va = va;
> > +               vring->dma = dma;
> > +       }
> > +
> > +       return 0;
> > +}
> 
> > +/* Interrupt handler. */
> > +static irqreturn_t mlxbf_tmfifo_irq_handler(int irq, void *arg)
> > +{
> > +       struct mlxbf_tmfifo_irq_info *irq_info;
> > +
> > +       irq_info = (struct mlxbf_tmfifo_irq_info *)arg;
> 
> Useless casting.
> Assignment can be done in definition block.

Updated in v10

> 
> > +       if (irq_info->index < MLXBF_TM_IRQ_CNT &&
> > +           !test_and_set_bit(irq_info->index, &irq_info->fifo->pend_events))
> > +               schedule_work(&irq_info->fifo->work);
> > +
> > +       return IRQ_HANDLED;
> > +}
> > +
> > +/* Get the next packet descriptor from the vring. */
> > +static struct vring_desc *mlxbf_tmfifo_get_next_desc(struct virtqueue *vq)
> > +{
> > +       struct mlxbf_tmfifo_vring *vring;
> > +       unsigned int idx, head;
> > +       struct vring *vr;
> > +
> > +       vr = (struct vring *)virtqueue_get_vring(vq);
> 
> Return type is different? Is it safe to cast? Why?

It's 'const' casting. Fixed in v10 to use 'const struct vring *vr' instead.

> 
> > +       if (!vr)
> > +               return NULL;
> 
> + blank line

Updated in v10

> 
> > +       vring = (struct mlxbf_tmfifo_vring *)vq->priv;
> 
> Do you need explicit casting?

Updated in v10

> 
> > +       if (vring->next_avail == virtio16_to_cpu(vq->vdev, vr->avail->idx))
> > +               return NULL;
> 
> +blank line

Updated in v10

> 
> > +       idx = vring->next_avail % vr->num;
> > +       head = virtio16_to_cpu(vq->vdev, vr->avail->ring[idx]);
> > +       if (WARN_ON(head >= vr->num))
> > +               return NULL;
> > +       vring->next_avail++;
> > +
> > +       return &vr->desc[head];
> > +}
> > +
> > +/* Release virtio descriptor. */
> > +static void mlxbf_tmfifo_release_desc(struct virtio_device *vdev,
> > +                                     struct vring *vr, struct vring_desc *desc,
> > +                                     u32 len)
> > +{
> > +       u16 idx, vr_idx;
> > +
> > +       vr_idx = virtio16_to_cpu(vdev, vr->used->idx);
> > +       idx = vr_idx % vr->num;
> > +       vr->used->ring[idx].id = cpu_to_virtio32(vdev, desc - vr->desc);
> > +       vr->used->ring[idx].len = cpu_to_virtio32(vdev, len);
> > +
> > +       /* Virtio could poll and check the 'idx' to decide
> > +        * whether the desc is done or not. Add a memory
> > +        * barrier here to make sure the update above completes
> > +        * before updating the idx.
> > +        */
> 
> Multi-line comment style is broken.

Updated in v10

> 
> > +       mb();
> > +       vr->used->idx = cpu_to_virtio16(vdev, vr_idx + 1);
> > +}
> 
> > +/* House-keeping timer. */
> > +static void mlxbf_tmfifo_timer(struct timer_list *arg)
> > +{
> > +       struct mlxbf_tmfifo *fifo;
> 
> > +       fifo = container_of(arg, struct mlxbf_tmfifo, timer);
> 
> Can't be done in the definition block?

Updated in v10

> 
> > +       /*
> > +        * Wake up the work handler to poll the Rx FIFO in case interrupt
> > +        * missing or any leftover bytes stuck in the FIFO.
> > +        */
> > +       test_and_set_bit(MLXBF_TM_RX_HWM_IRQ, &fifo->pend_events);
> 
> How do you utilize test results?

Fixed in v10

> 
> > +
> > +       /*
> > +        * Wake up Tx handler in case virtio has queued too many packets
> > +        * and are waiting for buffer return.
> > +        */
> > +       test_and_set_bit(MLXBF_TM_TX_LWM_IRQ, &fifo->pend_events);
> 
> Ditto.

Fixed in v10

> 
> > +
> > +       schedule_work(&fifo->work);
> > +
> > +       mod_timer(&fifo->timer, jiffies + mlxbf_tmfifo_timer_interval);
> > +}
> 
> > +       /* Adjust the size to available space. */
> > +       if (size + sizeof(hdr) > avail * sizeof(u64))
> > +               size = avail * sizeof(u64) - sizeof(hdr);
> 
> Can avail be 0?

It won't be 0. There is a check at the beginning of this function.
The function will return is avail is too small.

> 
> > +       /* Write header. */
> > +       hdr.u.data = 0;
> > +       hdr.type = VIRTIO_ID_CONSOLE;
> > +       hdr.len = htons(size);
> > +       hdr.u.data_le = cpu_to_le64(hdr.u.data);
> 
> > +       writeq(hdr.u.data, fifo->tx_base + MLXBF_TMFIFO_TX_DATA);
> 
> So, this one is not protected anyhow? Potential race condition?

The spin-lock is to protect reference to the ‘tx_buf’, not the read/write of the fifo.
The fifo read/write is protected by mutex. Added a comment in v10 to avoid such
confusion.
> 
> > +
> > +       spin_lock_irqsave(&fifo->spin_lock, flags);
> > +
> > +       while (size > 0) {
> > +               addr = cons->tx_buf + cons->tx_head;
> > +
> > +               if (cons->tx_head + sizeof(u64) <=
> > +                   MLXBF_TMFIFO_CONS_TX_BUF_SIZE) {
> > +                       memcpy(&data, addr, sizeof(u64));
> > +               } else {
> > +                       partial = MLXBF_TMFIFO_CONS_TX_BUF_SIZE - cons->tx_head;
> > +                       memcpy(&data, addr, partial);
> > +                       memcpy((u8 *)&data + partial, cons->tx_buf,
> > +                              sizeof(u64) - partial);
> > +               }
> > +               writeq(data, fifo->tx_base + MLXBF_TMFIFO_TX_DATA);
> > +
> > +               if (size >= sizeof(u64)) {
> > +                       mlxbf_tmfifo_vdev_tx_buf_index_inc(&cons->tx_head,
> > +                                                          sizeof(u64));
> > +                       size -= sizeof(u64);
> > +               } else {
> > +                       mlxbf_tmfifo_vdev_tx_buf_index_inc(&cons->tx_head,
> > +                                                          size);
> > +                       size = 0;
> > +               }
> > +       }
> > +
> > +       spin_unlock_irqrestore(&fifo->spin_lock, flags);
> > +}
> 
> > +       /* Rx/Tx one word (8 bytes) if not done. */
> > +       if (vring->cur_len != len)
> > +               mlxbf_tmfifo_rxtx_word(fifo, vdev, vring, desc, is_rx, avail,
> > +                                      len);
> 
> In such case better to keep it in one line.

Updated in v10

> 
> > +/* Get the array of feature bits for this device. */
> > +static u64 mlxbf_tmfifo_virtio_get_features(struct virtio_device *vdev)
> > +{
> > +       struct mlxbf_tmfifo_vdev *tm_vdev;
> > +
> > +       tm_vdev = container_of(vdev, struct mlxbf_tmfifo_vdev, vdev);
> > +       return tm_vdev->features;
> > +}
> > +
> > +/* Confirm device features to use. */
> > +static int mlxbf_tmfifo_virtio_finalize_features(struct virtio_device *vdev)
> > +{
> > +       struct mlxbf_tmfifo_vdev *tm_vdev;
> > +
> 
> > +       tm_vdev = container_of(vdev, struct mlxbf_tmfifo_vdev, vdev);
> 
> This is candidate to be a macro
> 
> #define mlxbt_vdev_to_tmfifo(...) ...

Updated in v10

> 
> > +       tm_vdev->features = vdev->features;
> > +
> > +       return 0;
> > +}
> 
> > +/* Create vdev type in a tmfifo. */
> > +static int mlxbf_tmfifo_create_vdev(struct device *dev,
> > +                                   struct mlxbf_tmfifo *fifo,
> > +                                   int vdev_id, u64 features,
> > +                                   void *config, u32 size)
> > +{
> > +       struct mlxbf_tmfifo_vdev *tm_vdev;
> > +       int ret = 0;
> > +
> > +       mutex_lock(&fifo->lock);
> > +
> > +       tm_vdev = fifo->vdev[vdev_id];
> > +       if (tm_vdev) {
> > +               dev_err(dev, "vdev %d already exists\n", vdev_id);
> > +               ret = -EEXIST;
> > +               goto fail;
> > +       }
> > +
> > +       tm_vdev = devm_kzalloc(dev, sizeof(*tm_vdev), GFP_KERNEL);
> > +       if (!tm_vdev) {
> > +               ret = -ENOMEM;
> > +               goto fail;
> > +       }
> > +
> > +       tm_vdev->vdev.id.device = vdev_id;
> > +       tm_vdev->vdev.config = &mlxbf_tmfifo_virtio_config_ops;
> > +       tm_vdev->vdev.dev.parent = &fifo->pdev->dev;
> > +       tm_vdev->features = features;
> > +       if (config)
> > +               memcpy(&tm_vdev->config, config, size);
> > +       if (mlxbf_tmfifo_alloc_vrings(fifo, tm_vdev, vdev_id)) {
> > +               dev_err(dev, "unable to allocate vring\n");
> > +               ret = -ENOMEM;
> > +               goto fail;
> > +       }
> > +       if (vdev_id == VIRTIO_ID_CONSOLE)
> 
> > +               tm_vdev->tx_buf = devm_kmalloc(dev,
> > +                                              MLXBF_TMFIFO_CONS_TX_BUF_SIZE,
> > +                                              GFP_KERNEL);
> 
> Are you sure devm_ suits here?

I think it's ok. The tx_buf is normal memory for output buffer.
It's running on SoC and the TmFifo is always there. So it's 
allocated at init and supposed to be released on module remove.

> 
> > +       fifo->vdev[vdev_id] = tm_vdev;
> > +
> > +       /* Register the virtio device. */
> > +       ret = register_virtio_device(&tm_vdev->vdev);
> > +       if (ret) {
> > +               dev_err(&fifo->pdev->dev, "register_virtio_device failed\n");
> > +               goto register_fail;
> > +       }
> > +
> > +       mutex_unlock(&fifo->lock);
> > +       return 0;
> > +
> > +register_fail:
> > +       mlxbf_tmfifo_free_vrings(fifo, vdev_id);
> > +       fifo->vdev[vdev_id] = NULL;
> > +fail:
> > +       mutex_unlock(&fifo->lock);
> > +       return ret;
> > +}
> 
> > +/* Read the configured network MAC address from efi variable. */
> > +static void mlxbf_tmfifo_get_cfg_mac(u8 *mac)
> > +{
> > +       efi_guid_t guid = EFI_GLOBAL_VARIABLE_GUID;
> > +       efi_status_t status;
> > +       unsigned long size;
> > +       u8 buf[6];
> > +
> > +       size = sizeof(buf);
> > +       status = efi.get_variable(mlxbf_tmfifo_efi_name, &guid, NULL, &size,
> > +                                 buf);
> > +       if (status == EFI_SUCCESS && size == sizeof(buf))
> > +               memcpy(mac, buf, sizeof(buf));
> > +}
> 
> Shouldn't be rather helper in EFI lib in kernel?

It's a little strange that there seems no such existing lib function. I searched
a little bit in kernel tree like below, they seem using the efi.get_variable()
approach.
arch/x86/kernel/ima_arch.c
drivers/scsi/isci/probe_roms.c
security/integrity/platform_certs/load_uefi.c

> 
> > +/* Probe the TMFIFO. */
> > +static int mlxbf_tmfifo_probe(struct platform_device *pdev)
> > +{
> > +       struct virtio_net_config net_config;
> > +       struct resource *rx_res, *tx_res;
> > +       struct mlxbf_tmfifo *fifo;
> > +       int i, ret;
> > +
> > +       /* Get the resource of the Rx FIFO. */
> > +       rx_res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
> > +       if (!rx_res)
> > +               return -ENODEV;
> > +
> > +       /* Get the resource of the Tx FIFO. */
> > +       tx_res = platform_get_resource(pdev, IORESOURCE_MEM, 1);
> > +       if (!tx_res)
> > +               return -ENODEV;
> > +
> > +       if (!devm_request_mem_region(&pdev->dev, rx_res->start,
> > +                                    resource_size(rx_res), "bf-tmfifo"))
> > +               return -EBUSY;
> > +
> > +       if (!devm_request_mem_region(&pdev->dev, tx_res->start,
> > +                                    resource_size(tx_res), "bf-tmfifo"))
> > +               return -EBUSY;
> > +
> > +       fifo = devm_kzalloc(&pdev->dev, sizeof(*fifo), GFP_KERNEL);
> > +       if (!fifo)
> > +               return -ENOMEM;
> > +
> > +       fifo->pdev = pdev;
> > +       platform_set_drvdata(pdev, fifo);
> > +
> > +       spin_lock_init(&fifo->spin_lock);
> > +       INIT_WORK(&fifo->work, mlxbf_tmfifo_work_handler);
> > +
> > +       timer_setup(&fifo->timer, mlxbf_tmfifo_timer, 0);
> > +
> > +       for (i = 0; i < MLXBF_TM_IRQ_CNT; i++) {
> > +               fifo->irq_info[i].index = i;
> > +               fifo->irq_info[i].fifo = fifo;
> 
> > +               fifo->irq_info[i].irq = platform_get_irq(pdev, i);
> 
> 
> > +               ret = devm_request_irq(&pdev->dev, fifo->irq_info[i].irq,
> > +                                      mlxbf_tmfifo_irq_handler, 0,
> > +                                      "tmfifo", &fifo->irq_info[i]);
> > +               if (ret) {
> > +                       dev_err(&pdev->dev, "devm_request_irq failed\n");
> > +                       fifo->irq_info[i].irq = 0;
> > +                       return ret;
> > +               }
> > +       }
> > +
> 
> > +       fifo->rx_base = devm_ioremap(&pdev->dev, rx_res->start,
> > +                                    resource_size(rx_res));
> > +       if (!fifo->rx_base)
> > +               return -ENOMEM;
> > +
> > +       fifo->tx_base = devm_ioremap(&pdev->dev, tx_res->start,
> > +                                    resource_size(tx_res));
> > +       if (!fifo->tx_base)
> > +               return -ENOMEM;
> 
> Switch to devm_ioremap_resource().
> However, I think you probably need memremap().

Updated in v10 to use devm_ioremap_resource().

The map is just for several registers which is not meant to be
cache-able. Probably devm_ioremap_nocache() might make
more sense? I checked arm64/include/asm/io.h, looks like
ioremap/ioremap_nocache/ioremap_wt are defined the same
thing.

> 
> > +       mutex_init(&fifo->lock);
> 
> Isn't too late for initializing this one?

It won't cause problem here due to the 'is_ready'
flag, but definitely better to move it ahead. Updated in v10.

> 
> 
> > +/* Device remove function. */
> > +static int mlxbf_tmfifo_remove(struct platform_device *pdev)
> > +{
> > +       struct mlxbf_tmfifo *fifo = platform_get_drvdata(pdev);
> > +
> 
> > +       if (fifo)
> 
> How is it possible to be not true?

Updated in v10. Removed.

> 
> > +               mlxbf_tmfifo_cleanup(fifo);
> > +
> 
> > +       platform_set_drvdata(pdev, NULL);
> 
> Redundant.

Updated in v10. Removed.

> 
> > +
> > +       return 0;
> > +}
> 
> > +MODULE_LICENSE("GPL");
> 
> Is it correct?

Fixed in v10 and updated to MODULE_LICENSE("GPL v2");

> 
> --
> With Best Regards,
> Andy Shevchenko

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH v10] platform/mellanox: Add TmFifo driver for Mellanox BlueField Soc
       [not found] <b143b40446c1870fb8d422b364ead95d54552be9.1527264077.git.lsun@mellanox.com>
                   ` (3 preceding siblings ...)
  2019-02-13 13:27 ` [PATCH v9] platform/mellanox: Add TmFifo driver for Mellanox BlueField Soc Liming Sun
@ 2019-02-28 15:51 ` Liming Sun
  2019-03-05 15:34   ` Andy Shevchenko
  2019-03-08 14:41 ` [PATCH v11] " Liming Sun
                   ` (6 subsequent siblings)
  11 siblings, 1 reply; 30+ messages in thread
From: Liming Sun @ 2019-02-28 15:51 UTC (permalink / raw)
  To: David Woods, Andy Shevchenko, Darren Hart, Vadim Pasternak
  Cc: Liming Sun, linux-kernel, platform-driver-x86

This commit adds the TmFifo platform driver for Mellanox BlueField
Soc. TmFifo is a shared FIFO which enables external host machine
to exchange data with the SoC via USB or PCIe. The driver is based
on virtio framework and has console and network access enabled.

Reviewed-by: Vadim Pasternak <vadimp@mellanox.com>
Signed-off-by: Liming Sun <lsun@mellanox.com>
---
v9->v10:
    Fixes for comments from Andy:
    - Use devm_ioremap_resource() instead of devm_ioremap().
    - Use kernel-doc comments.
    - Keep Makefile contents sorted.
    - Use same fixed format for offsets.
    - Use SZ_1K/SZ_32K instead of 1024/23*1024.
    - Remove unnecessary comments.
    - Use one style for max numbers.
    - More comments for mlxbf_tmfifo_vdev and mlxbf_tmfifo_data_64bit.
    - Use globally defined MTU instead of new definition.
    - Remove forward declaration of mlxbf_tmfifo_remove().
    - Remove PAGE_ALIGN() for dma_alloc_coherent)().
    - Remove the cast of "struct vring *".
    - Check return result of test_and_set_bit().
    - Add a macro mlxbt_vdev_to_tmfifo().
    - Several other minor coding style comments.
    Comment not applied:
    - "Shouldn't be rather helper in EFI lib in kernel"
      Looks like efi.get_variable() is the way I found in the kernel
      tree.
    - "this one is not protected anyhow? Potential race condition"
      In mlxbf_tmfifo_console_tx(), the spin-lock is used to protect the
      'tx_buf' only, not the FIFO writes. So there is no race condition.
    - "Is __packed needed in mlxbf_tmfifo_msg_hdr".
      Yes, it is needed to make sure the structure is 8 bytes.
    Fixes for comments from Vadim:
    - Use tab in mlxbf-tmfifo-regs.h
    - Use kernel-doc comments for struct mlxbf_tmfifo_msg_hdr and
      mlxbf_tmfifo_irq_info as well.
    - Use _MAX instead of _CNT in the macro definition to be consistent.
    - Fix the MODULE_LICENSE.
    - Use BIT_ULL() instead of BIT().
    - Remove argument of 'avail' for mlxbf_tmfifo_rxtx_header() and
      mlxbf_tmfifo_rxtx_word()
    - Revise logic in mlxbf_tmfifo_rxtx_one_desc() to remove the
      WARN_ON().
    - Change "union mlxbf_tmfifo_u64 u" to "union mlxbf_tmfifo_u64 buf"
      in mlxbf_tmfifo_rxtx_word().
    - Change date type of vring_change from 'int' to 'bool'.
    - Remove the blank lines after Signed-off.
    - Don’t use declaration in the middle.
    - Make the network header initialization in some more elegant way.
    - Change label done to mlxbf_tmfifo_desc_done.
    - Remove some unnecessary comments, and several other misc coding
      style comments.
    - Simplify code logic in mlxbf_tmfifo_virtio_notify()
    New changes by Liming:
    - Simplify the Rx/Tx function arguments to make it more readable.
v8->v9:
    Fixes for comments from Andy:
    - Use modern devm_xxx() API instead.
    Fixes for comments from Vadim:
    - Split the Rx/Tx function into smaller funcitons.
    - File name, copyright information.
    - Function and variable name conversion.
    - Local variable and indent coding styles.
    - Remove unnecessary 'inline' declarations.
    - Use devm_xxx() APIs.
    - Move the efi_char16_t MAC address definition to global.
    - Fix warnings reported by 'checkpatch --strict'.
    - Fix warnings reported by 'make CF="-D__CHECK_ENDIAN__"'.
    - Change select VIRTIO_xxx to depends on  VIRTIO_ in Kconfig.
    - Merge mlxbf_tmfifo_vdev_tx_buf_push() and
      mlxbf_tmfifo_vdev_tx_buf_pop().
    - Add union to avoid casting between __le64 and u64.
    - Several other misc coding style comments.
    New changes by Liming:
    - Removed the DT binding documentation since only ACPI is
      supported for now by UEFI on the SoC.
v8: Re-submit under drivers/platform/mellanox for the target-side
    platform driver only.
v7: Added host side drivers into the same patch set.
v5~v6: Coding style fix.
v1~v4: Initial version for directory drivers/soc/mellanox.
---
 drivers/platform/mellanox/Kconfig             |   12 +-
 drivers/platform/mellanox/Makefile            |    1 +
 drivers/platform/mellanox/mlxbf-tmfifo-regs.h |   63 ++
 drivers/platform/mellanox/mlxbf-tmfifo.c      | 1342 +++++++++++++++++++++++++
 4 files changed, 1417 insertions(+), 1 deletion(-)
 create mode 100644 drivers/platform/mellanox/mlxbf-tmfifo-regs.h
 create mode 100644 drivers/platform/mellanox/mlxbf-tmfifo.c

diff --git a/drivers/platform/mellanox/Kconfig b/drivers/platform/mellanox/Kconfig
index cd8a908..530fe7e 100644
--- a/drivers/platform/mellanox/Kconfig
+++ b/drivers/platform/mellanox/Kconfig
@@ -5,7 +5,7 @@
 
 menuconfig MELLANOX_PLATFORM
 	bool "Platform support for Mellanox hardware"
-	depends on X86 || ARM || COMPILE_TEST
+	depends on X86 || ARM || ARM64 || COMPILE_TEST
 	---help---
 	  Say Y here to get to see options for platform support for
 	  Mellanox systems. This option alone does not add any kernel code.
@@ -34,4 +34,14 @@ config MLXREG_IO
 	  to system resets operation, system reset causes monitoring and some
 	  kinds of mux selection.
 
+config MLXBF_TMFIFO
+	tristate "Mellanox BlueField SoC TmFifo platform driver"
+	depends on ARM64
+	depends on ACPI
+	depends on VIRTIO_CONSOLE && VIRTIO_NET
+	help
+	  Say y here to enable TmFifo support. The TmFifo driver provides
+          platform driver support for the TmFifo which supports console
+          and networking based on the virtio framework.
+
 endif # MELLANOX_PLATFORM
diff --git a/drivers/platform/mellanox/Makefile b/drivers/platform/mellanox/Makefile
index 57074d9c..a229bda1 100644
--- a/drivers/platform/mellanox/Makefile
+++ b/drivers/platform/mellanox/Makefile
@@ -3,5 +3,6 @@
 # Makefile for linux/drivers/platform/mellanox
 # Mellanox Platform-Specific Drivers
 #
+obj-$(CONFIG_MLXBF_TMFIFO)	+= mlxbf-tmfifo.o
 obj-$(CONFIG_MLXREG_HOTPLUG)	+= mlxreg-hotplug.o
 obj-$(CONFIG_MLXREG_IO) += mlxreg-io.o
diff --git a/drivers/platform/mellanox/mlxbf-tmfifo-regs.h b/drivers/platform/mellanox/mlxbf-tmfifo-regs.h
new file mode 100644
index 0000000..4b2bd29
--- /dev/null
+++ b/drivers/platform/mellanox/mlxbf-tmfifo-regs.h
@@ -0,0 +1,63 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Copyright (c) 2019, Mellanox Technologies. All rights reserved.
+ */
+
+#ifndef __MLXBF_TMFIFO_REGS_H__
+#define __MLXBF_TMFIFO_REGS_H__
+
+#include <linux/types.h>
+#include <linux/bits.h>
+
+#define MLXBF_TMFIFO_TX_DATA				0x00
+#define MLXBF_TMFIFO_TX_STS				0x08
+#define MLXBF_TMFIFO_TX_STS__LENGTH			0x0001
+#define MLXBF_TMFIFO_TX_STS__COUNT_SHIFT		0
+#define MLXBF_TMFIFO_TX_STS__COUNT_WIDTH		9
+#define MLXBF_TMFIFO_TX_STS__COUNT_RESET_VAL		0
+#define MLXBF_TMFIFO_TX_STS__COUNT_RMASK		GENMASK(8, 0)
+#define MLXBF_TMFIFO_TX_STS__COUNT_MASK			GENMASK(8, 0)
+#define MLXBF_TMFIFO_TX_CTL				0x10
+#define MLXBF_TMFIFO_TX_CTL__LENGTH			0x0001
+#define MLXBF_TMFIFO_TX_CTL__LWM_SHIFT			0
+#define MLXBF_TMFIFO_TX_CTL__LWM_WIDTH			8
+#define MLXBF_TMFIFO_TX_CTL__LWM_RESET_VAL		128
+#define MLXBF_TMFIFO_TX_CTL__LWM_RMASK			GENMASK(7, 0)
+#define MLXBF_TMFIFO_TX_CTL__LWM_MASK			GENMASK(7, 0)
+#define MLXBF_TMFIFO_TX_CTL__HWM_SHIFT			8
+#define MLXBF_TMFIFO_TX_CTL__HWM_WIDTH			8
+#define MLXBF_TMFIFO_TX_CTL__HWM_RESET_VAL		128
+#define MLXBF_TMFIFO_TX_CTL__HWM_RMASK			GENMASK(7, 0)
+#define MLXBF_TMFIFO_TX_CTL__HWM_MASK			GENMASK(15, 8)
+#define MLXBF_TMFIFO_TX_CTL__MAX_ENTRIES_SHIFT		32
+#define MLXBF_TMFIFO_TX_CTL__MAX_ENTRIES_WIDTH		9
+#define MLXBF_TMFIFO_TX_CTL__MAX_ENTRIES_RESET_VAL	256
+#define MLXBF_TMFIFO_TX_CTL__MAX_ENTRIES_RMASK		GENMASK(8, 0)
+#define MLXBF_TMFIFO_TX_CTL__MAX_ENTRIES_MASK		GENMASK_ULL(40, 32)
+#define MLXBF_TMFIFO_RX_DATA				0x00
+#define MLXBF_TMFIFO_RX_STS				0x08
+#define MLXBF_TMFIFO_RX_STS__LENGTH			0x0001
+#define MLXBF_TMFIFO_RX_STS__COUNT_SHIFT		0
+#define MLXBF_TMFIFO_RX_STS__COUNT_WIDTH		9
+#define MLXBF_TMFIFO_RX_STS__COUNT_RESET_VAL		0
+#define MLXBF_TMFIFO_RX_STS__COUNT_RMASK		GENMASK(8, 0)
+#define MLXBF_TMFIFO_RX_STS__COUNT_MASK			GENMASK(8, 0)
+#define MLXBF_TMFIFO_RX_CTL				0x10
+#define MLXBF_TMFIFO_RX_CTL__LENGTH			0x0001
+#define MLXBF_TMFIFO_RX_CTL__LWM_SHIFT			0
+#define MLXBF_TMFIFO_RX_CTL__LWM_WIDTH			8
+#define MLXBF_TMFIFO_RX_CTL__LWM_RESET_VAL		128
+#define MLXBF_TMFIFO_RX_CTL__LWM_RMASK			GENMASK(7, 0)
+#define MLXBF_TMFIFO_RX_CTL__LWM_MASK			GENMASK(7, 0)
+#define MLXBF_TMFIFO_RX_CTL__HWM_SHIFT			8
+#define MLXBF_TMFIFO_RX_CTL__HWM_WIDTH			8
+#define MLXBF_TMFIFO_RX_CTL__HWM_RESET_VAL		128
+#define MLXBF_TMFIFO_RX_CTL__HWM_RMASK			GENMASK(7, 0)
+#define MLXBF_TMFIFO_RX_CTL__HWM_MASK			GENMASK(15, 8)
+#define MLXBF_TMFIFO_RX_CTL__MAX_ENTRIES_SHIFT		32
+#define MLXBF_TMFIFO_RX_CTL__MAX_ENTRIES_WIDTH		9
+#define MLXBF_TMFIFO_RX_CTL__MAX_ENTRIES_RESET_VAL	256
+#define MLXBF_TMFIFO_RX_CTL__MAX_ENTRIES_RMASK		GENMASK(8, 0)
+#define MLXBF_TMFIFO_RX_CTL__MAX_ENTRIES_MASK		GENMASK_ULL(40, 32)
+
+#endif /* !defined(__MLXBF_TMFIFO_REGS_H__) */
diff --git a/drivers/platform/mellanox/mlxbf-tmfifo.c b/drivers/platform/mellanox/mlxbf-tmfifo.c
new file mode 100644
index 0000000..a6626ffe1
--- /dev/null
+++ b/drivers/platform/mellanox/mlxbf-tmfifo.c
@@ -0,0 +1,1342 @@
+// SPDX-License-Identifier: GPL-2.0+
+/*
+ * Mellanox BlueField SoC TmFifo driver
+ *
+ * Copyright (C) 2019 Mellanox Technologies
+ */
+
+#include <linux/acpi.h>
+#include <linux/bitfield.h>
+#include <linux/efi.h>
+#include <linux/irq.h>
+#include <linux/module.h>
+#include <linux/mutex.h>
+#include <linux/platform_device.h>
+#include <linux/types.h>
+#include <linux/virtio_config.h>
+#include <linux/virtio_console.h>
+#include <linux/virtio_ids.h>
+#include <linux/virtio_net.h>
+#include <linux/virtio_ring.h>
+
+#include "mlxbf-tmfifo-regs.h"
+
+/* Vring size. */
+#define MLXBF_TMFIFO_VRING_SIZE			SZ_1K
+
+/* Console Tx buffer size. */
+#define MLXBF_TMFIFO_CONS_TX_BUF_SIZE		SZ_32K
+
+/* Console Tx buffer size with some reservation. */
+#define MLXBF_TMFIFO_CONS_TX_BUF_RSV_SIZE	\
+	(MLXBF_TMFIFO_CONS_TX_BUF_SIZE - 8)
+
+/* House-keeping timer interval. */
+#define MLXBF_TMFIFO_TIMER_INTERVAL		(HZ / 10)
+
+/* Virtual devices sharing the TM FIFO. */
+#define MLXBF_TMFIFO_VDEV_MAX		(VIRTIO_ID_CONSOLE + 1)
+
+/*
+ * Reserve 1/16 of TmFifo space, so console messages are not starved by
+ * the networking traffic.
+ */
+#define MLXBF_TMFIFO_RESERVE_RATIO		16
+
+/* Message with data needs at least two words (for header & data). */
+#define MLXBF_TMFIFO_DATA_MIN_WORDS		2
+
+struct mlxbf_tmfifo;
+
+/**
+ * mlxbf_tmfifo_vring - Structure of the TmFifo virtual ring
+ * @va: virtual address of the ring
+ * @dma: dma address of the ring
+ * @vq: pointer to the virtio virtqueue
+ * @desc: current descriptor of the pending packet
+ * @desc_head: head descriptor of the pending packet
+ * @cur_len: processed length of the current descriptor
+ * @rem_len: remaining length of the pending packet
+ * @pkt_len: total length of the pending packet
+ * @next_avail: next avail descriptor id
+ * @num: vring size (number of descriptors)
+ * @align: vring alignment size
+ * @index: vring index
+ * @vdev_id: vring virtio id (VIRTIO_ID_xxx)
+ * @fifo: pointer to the tmfifo structure
+ */
+struct mlxbf_tmfifo_vring {
+	void *va;
+	dma_addr_t dma;
+	struct virtqueue *vq;
+	struct vring_desc *desc;
+	struct vring_desc *desc_head;
+	int cur_len;
+	int rem_len;
+	u32 pkt_len;
+	u16 next_avail;
+	int num;
+	int align;
+	int index;
+	int vdev_id;
+	struct mlxbf_tmfifo *fifo;
+};
+
+/* Interrupt types. */
+enum {
+	MLXBF_TM_RX_LWM_IRQ,
+	MLXBF_TM_RX_HWM_IRQ,
+	MLXBF_TM_TX_LWM_IRQ,
+	MLXBF_TM_TX_HWM_IRQ,
+	MLXBF_TM_MAX_IRQ
+};
+
+/* Ring types (Rx & Tx). */
+enum {
+	MLXBF_TMFIFO_VRING_RX,
+	MLXBF_TMFIFO_VRING_TX,
+	MLXBF_TMFIFO_VRING_MAX
+};
+
+/**
+ * mlxbf_tmfifo_vdev - Structure of the TmFifo virtual device
+ * @vdev: virtio device, in which the vdev.id.device field has the
+ *        VIRTIO_ID_xxx id to distinguish the virtual device.
+ * @status: status of the device
+ * @features: supported features of the device
+ * @vrings: array of tmfifo vrings of this device
+ * @config.cons: virtual console config -
+ *               select if vdev.id.device is VIRTIO_ID_CONSOLE
+ * @config.net: virtual network config -
+ *              select if vdev.id.device is VIRTIO_ID_NET
+ * @tx_head: head of the tx_buf
+ * @tx_tail: tail of the tx_buf
+ * @tx_buf: output ring buffer
+ */
+struct mlxbf_tmfifo_vdev {
+	struct virtio_device vdev;
+	u8 status;
+	u64 features;
+	struct mlxbf_tmfifo_vring vrings[MLXBF_TMFIFO_VRING_MAX];
+	union {
+		struct virtio_console_config cons;
+		struct virtio_net_config net;
+	} config;
+	u32 tx_head;
+	u32 tx_tail;
+	u8 *tx_buf;
+};
+
+/**
+ * mlxbf_tmfifo_irq_info - Structure of the interrupt information
+ * @fifo: pointer to the tmfifo structure
+ * @irq: interrupt number
+ * @index: index into the interrupt array
+ */
+struct mlxbf_tmfifo_irq_info {
+	struct mlxbf_tmfifo *fifo;
+	int irq;
+	int index;
+};
+
+/**
+ * mlxbf_tmfifo - Structure of the TmFifo
+ * @vdev: array of the virtual devices running over the TmFifo
+ * @pdev: platform device
+ * @lock: lock to protect the TmFifo access
+ * @rx_base: mapped register base address for the Rx fifo
+ * @tx_base: mapped register base address for the Tx fifo
+ * @rx_fifo_size: number of entries of the Rx fifo
+ * @tx_fifo_size: number of entries of the Tx fifo
+ * @pend_events: pending bits for deferred events
+ * @irq_info: interrupt information
+ * @work: work struct for deferred process
+ * @timer: background timer
+ * @vring: Tx/Rx ring
+ * @spin_lock: spin lock
+ * @is_ready: ready flag
+ */
+struct mlxbf_tmfifo {
+	struct mlxbf_tmfifo_vdev *vdev[MLXBF_TMFIFO_VDEV_MAX];
+	struct platform_device *pdev;
+	struct mutex lock;		/* TmFifo lock */
+	void __iomem *rx_base;
+	void __iomem *tx_base;
+	int rx_fifo_size;
+	int tx_fifo_size;
+	unsigned long pend_events;
+	struct mlxbf_tmfifo_irq_info irq_info[MLXBF_TM_MAX_IRQ];
+	struct work_struct work;
+	struct timer_list timer;
+	struct mlxbf_tmfifo_vring *vring[2];
+	spinlock_t spin_lock;		/* spin lock */
+	bool is_ready;
+};
+
+/**
+ * mlxbf_tmfifo_u64 - Union of 64-bit data
+ * @data - 64-bit data in host byte order
+ * @data_le - 64-bit data in little-endian byte order
+ *
+ * It's expected to send 64-bit little-endian value (__le64) into the TmFifo.
+ * readq() and writeq() expect u64 instead. A union structure is used here
+ * to workaround the explicit casting usage like writeq(*(u64 *)&data_le).
+ */
+union mlxbf_tmfifo_u64 {
+	u64 data;
+	__le64 data_le;
+};
+
+/**
+ * mlxbf_tmfifo_msg_hdr - Structure of the TmFifo message header
+ * @type: message type
+ * @len: payload length
+ * @u: 64-bit union data
+ */
+union mlxbf_tmfifo_msg_hdr {
+	struct {
+		u8 type;
+		__be16 len;
+		u8 unused[5];
+	} __packed;
+	union mlxbf_tmfifo_u64 u;
+};
+
+/*
+ * Default MAC.
+ * This MAC address will be read from EFI persistent variable if configured.
+ * It can also be reconfigured with standard Linux tools.
+ */
+static u8 mlxbf_tmfifo_net_default_mac[6] = {
+	0x00, 0x1A, 0xCA, 0xFF, 0xFF, 0x01};
+
+/* EFI variable name of the MAC address. */
+static efi_char16_t mlxbf_tmfifo_efi_name[] = L"RshimMacAddr";
+
+/* Maximum L2 header length. */
+#define MLXBF_TMFIFO_NET_L2_OVERHEAD	36
+
+/* Supported virtio-net features. */
+#define MLXBF_TMFIFO_NET_FEATURES	(BIT_ULL(VIRTIO_NET_F_MTU) | \
+					 BIT_ULL(VIRTIO_NET_F_STATUS) | \
+					 BIT_ULL(VIRTIO_NET_F_MAC))
+
+#define mlxbf_vdev_to_tmfifo(dev)	\
+	container_of(dev, struct mlxbf_tmfifo_vdev, vdev)
+
+/* Console output are buffered and can be accessed with the functions below. */
+
+/* Return the consumed Tx buffer space. */
+static int mlxbf_tmfifo_vdev_tx_buf_len(struct mlxbf_tmfifo_vdev *tm_vdev)
+{
+	int len;
+
+	if (tm_vdev->tx_tail >= tm_vdev->tx_head)
+		len = tm_vdev->tx_tail - tm_vdev->tx_head;
+	else
+		len = MLXBF_TMFIFO_CONS_TX_BUF_SIZE - tm_vdev->tx_head +
+			tm_vdev->tx_tail;
+
+	return len;
+}
+
+/* Return the available Tx buffer space. */
+static int mlxbf_tmfifo_vdev_tx_buf_avail(struct mlxbf_tmfifo_vdev *tm_vdev)
+{
+	int len;
+
+	len = MLXBF_TMFIFO_CONS_TX_BUF_RSV_SIZE -
+		mlxbf_tmfifo_vdev_tx_buf_len(tm_vdev);
+
+	return len;
+}
+
+/* Update Rx/Tx buffer index pointer. */
+static void mlxbf_tmfifo_vdev_tx_buf_index_inc(u32 *index, u32 len)
+{
+	*index += len;
+	if (*index >= MLXBF_TMFIFO_CONS_TX_BUF_SIZE)
+		*index -= MLXBF_TMFIFO_CONS_TX_BUF_SIZE;
+}
+
+/* Allocate vrings for the fifo. */
+static int mlxbf_tmfifo_alloc_vrings(struct mlxbf_tmfifo *fifo,
+				     struct mlxbf_tmfifo_vdev *tm_vdev)
+{
+	struct mlxbf_tmfifo_vring *vring;
+	struct device *dev;
+	dma_addr_t dma;
+	int i, size;
+	void *va;
+
+	for (i = 0; i < ARRAY_SIZE(tm_vdev->vrings); i++) {
+		vring = &tm_vdev->vrings[i];
+		vring->fifo = fifo;
+		vring->num = MLXBF_TMFIFO_VRING_SIZE;
+		vring->align = SMP_CACHE_BYTES;
+		vring->index = i;
+		vring->vdev_id = tm_vdev->vdev.id.device;
+		dev = &tm_vdev->vdev.dev;
+
+		size = vring_size(vring->num, vring->align);
+		va = dma_alloc_coherent(dev->parent, size, &dma, GFP_KERNEL);
+		if (!va) {
+			dev_err(dev->parent, "dma_alloc_coherent failed\n");
+			return -ENOMEM;
+		}
+
+		vring->va = va;
+		vring->dma = dma;
+	}
+
+	return 0;
+}
+
+/* Free vrings of the fifo device. */
+static void mlxbf_tmfifo_free_vrings(struct mlxbf_tmfifo *fifo,
+				     struct mlxbf_tmfifo_vdev *tm_vdev)
+{
+	struct mlxbf_tmfifo_vring *vring;
+	int i, size;
+
+	for (i = 0; i < ARRAY_SIZE(tm_vdev->vrings); i++) {
+		vring = &tm_vdev->vrings[i];
+		if (vring->va) {
+			size = vring_size(vring->num, vring->align);
+			dma_free_coherent(tm_vdev->vdev.dev.parent, size,
+					  vring->va, vring->dma);
+			vring->va = NULL;
+			if (vring->vq) {
+				vring_del_virtqueue(vring->vq);
+				vring->vq = NULL;
+			}
+		}
+	}
+}
+
+/* Disable interrupts of the fifo device. */
+static void mlxbf_tmfifo_disable_irqs(struct mlxbf_tmfifo *fifo)
+{
+	int i, irq;
+
+	for (i = 0; i < MLXBF_TM_MAX_IRQ; i++) {
+		irq = fifo->irq_info[i].irq;
+		if (irq) {
+			fifo->irq_info[i].irq = 0;
+			disable_irq(irq);
+		}
+	}
+}
+
+/* Interrupt handler. */
+static irqreturn_t mlxbf_tmfifo_irq_handler(int irq, void *arg)
+{
+	struct mlxbf_tmfifo_irq_info *irq_info = arg;
+
+	if (irq_info->index < MLXBF_TM_MAX_IRQ &&
+	    !test_and_set_bit(irq_info->index, &irq_info->fifo->pend_events))
+		schedule_work(&irq_info->fifo->work);
+
+	return IRQ_HANDLED;
+}
+
+/* Get the next packet descriptor from the vring. */
+static struct vring_desc *
+mlxbf_tmfifo_get_next_desc(struct mlxbf_tmfifo_vring *vring)
+{
+	const struct vring *vr = virtqueue_get_vring(vring->vq);
+	struct virtio_device *vdev = vring->vq->vdev;
+	unsigned int idx, head;
+
+	if (vring->next_avail == virtio16_to_cpu(vdev, vr->avail->idx))
+		return NULL;
+
+	idx = vring->next_avail % vr->num;
+	head = virtio16_to_cpu(vdev, vr->avail->ring[idx]);
+	if (WARN_ON(head >= vr->num))
+		return NULL;
+
+	vring->next_avail++;
+
+	return &vr->desc[head];
+}
+
+/* Release virtio descriptor. */
+static void mlxbf_tmfifo_release_desc(struct mlxbf_tmfifo_vring *vring,
+				      struct vring_desc *desc, u32 len)
+{
+	const struct vring *vr = virtqueue_get_vring(vring->vq);
+	struct virtio_device *vdev = vring->vq->vdev;
+	u16 idx, vr_idx;
+
+	vr_idx = virtio16_to_cpu(vdev, vr->used->idx);
+	idx = vr_idx % vr->num;
+	vr->used->ring[idx].id = cpu_to_virtio32(vdev, desc - vr->desc);
+	vr->used->ring[idx].len = cpu_to_virtio32(vdev, len);
+
+	/*
+	 * Virtio could poll and check the 'idx' to decide whether the desc is
+	 * done or not. Add a memory barrier here to make sure the update above
+	 * completes before updating the idx.
+	 */
+	mb();
+	vr->used->idx = cpu_to_virtio16(vdev, vr_idx + 1);
+}
+
+/* Get the total length of the descriptor chain. */
+static u32 mlxbf_tmfifo_get_pkt_len(struct mlxbf_tmfifo_vring *vring,
+				    struct vring_desc *desc)
+{
+	const struct vring *vr = virtqueue_get_vring(vring->vq);
+	struct virtio_device *vdev = vring->vq->vdev;
+	u32 len = 0, idx;
+
+	while (desc) {
+		len += virtio32_to_cpu(vdev, desc->len);
+		if (!(virtio16_to_cpu(vdev, desc->flags) & VRING_DESC_F_NEXT))
+			break;
+		idx = virtio16_to_cpu(vdev, desc->next);
+		desc = &vr->desc[idx];
+	}
+
+	return len;
+}
+
+static void mlxbf_tmfifo_release_pending_pkt(struct mlxbf_tmfifo_vring *vring)
+{
+	struct vring_desc *desc_head;
+	u32 len = 0;
+
+	if (vring->desc_head) {
+		desc_head = vring->desc_head;
+		len = vring->pkt_len;
+	} else {
+		desc_head = mlxbf_tmfifo_get_next_desc(vring);
+		if (desc_head)
+			len = mlxbf_tmfifo_get_pkt_len(vring, desc_head);
+	}
+
+	if (desc_head)
+		mlxbf_tmfifo_release_desc(vring, desc_head, len);
+
+	vring->pkt_len = 0;
+	vring->desc = NULL;
+	vring->desc_head = NULL;
+}
+
+static void mlxbf_tmfifo_init_net_desc(struct mlxbf_tmfifo_vring *vring,
+				       struct vring_desc *desc, bool is_rx)
+{
+	struct virtio_device *vdev = vring->vq->vdev;
+	struct virtio_net_hdr *net_hdr;
+
+	net_hdr = phys_to_virt(virtio64_to_cpu(vdev, desc->addr));
+	memset(net_hdr, 0, sizeof(*net_hdr));
+}
+
+/* Get and initialize the next packet. */
+static struct vring_desc *
+mlxbf_tmfifo_get_next_pkt(struct mlxbf_tmfifo_vring *vring, bool is_rx)
+{
+	struct vring_desc *desc;
+
+	desc = mlxbf_tmfifo_get_next_desc(vring);
+	if (desc && is_rx && vring->vdev_id == VIRTIO_ID_NET)
+		mlxbf_tmfifo_init_net_desc(vring, desc, is_rx);
+
+	vring->desc_head = desc;
+	vring->desc = desc;
+
+	return desc;
+}
+
+/* House-keeping timer. */
+static void mlxbf_tmfifo_timer(struct timer_list *arg)
+{
+	struct mlxbf_tmfifo *fifo = container_of(arg, struct mlxbf_tmfifo,
+						 timer);
+	int more;
+
+	more = !test_and_set_bit(MLXBF_TM_RX_HWM_IRQ, &fifo->pend_events) |
+		    !test_and_set_bit(MLXBF_TM_TX_LWM_IRQ, &fifo->pend_events);
+
+	if (more)
+		schedule_work(&fifo->work);
+
+	mod_timer(&fifo->timer, jiffies + MLXBF_TMFIFO_TIMER_INTERVAL);
+}
+
+/* Copy one console packet into the output buffer. */
+static void mlxbf_tmfifo_console_output_one(struct mlxbf_tmfifo_vdev *cons,
+					    struct mlxbf_tmfifo_vring *vring,
+					    struct vring_desc *desc)
+{
+	const struct vring *vr = virtqueue_get_vring(vring->vq);
+	struct virtio_device *vdev = &cons->vdev;
+	u32 len, idx, seg;
+	void *addr;
+
+	while (desc) {
+		addr = phys_to_virt(virtio64_to_cpu(vdev, desc->addr));
+		len = virtio32_to_cpu(vdev, desc->len);
+
+		if (len <= MLXBF_TMFIFO_CONS_TX_BUF_SIZE - cons->tx_tail) {
+			memcpy(cons->tx_buf + cons->tx_tail, addr, len);
+		} else {
+			seg = MLXBF_TMFIFO_CONS_TX_BUF_SIZE - cons->tx_tail;
+			memcpy(cons->tx_buf + cons->tx_tail, addr, seg);
+			addr += seg;
+			memcpy(cons->tx_buf, addr, len - seg);
+		}
+		mlxbf_tmfifo_vdev_tx_buf_index_inc(&cons->tx_tail, len);
+
+		if (!(virtio16_to_cpu(vdev, desc->flags) & VRING_DESC_F_NEXT))
+			break;
+		idx = virtio16_to_cpu(vdev, desc->next);
+		desc = &vr->desc[idx];
+	}
+}
+
+/* Copy console data into the output buffer. */
+static void mlxbf_tmfifo_console_output(struct mlxbf_tmfifo_vdev *cons,
+					struct mlxbf_tmfifo_vring *vring)
+{
+	struct vring_desc *desc;
+	u32 len;
+
+	desc = mlxbf_tmfifo_get_next_desc(vring);
+	while (desc) {
+		/* Release the packet if not enough space. */
+		len = mlxbf_tmfifo_get_pkt_len(vring, desc);
+		if (len > mlxbf_tmfifo_vdev_tx_buf_avail(cons)) {
+			mlxbf_tmfifo_release_desc(vring, desc, len);
+			break;
+		}
+
+		mlxbf_tmfifo_console_output_one(cons, vring, desc);
+		mlxbf_tmfifo_release_desc(vring, desc, len);
+		desc = mlxbf_tmfifo_get_next_desc(vring);
+	}
+}
+
+/* Get the number of available words in Rx FIFO for receiving. */
+static int mlxbf_tmfifo_get_rx_avail(struct mlxbf_tmfifo *fifo)
+{
+	u64 sts;
+
+	sts = readq(fifo->rx_base + MLXBF_TMFIFO_RX_STS);
+	return FIELD_GET(MLXBF_TMFIFO_RX_STS__COUNT_MASK, sts);
+}
+
+/* Get the number of available words in the TmFifo for sending. */
+static int mlxbf_tmfifo_get_tx_avail(struct mlxbf_tmfifo *fifo, int vdev_id)
+{
+	int tx_reserve;
+	u64 sts;
+
+	/* Reserve some room in FIFO for console messages. */
+	if (vdev_id == VIRTIO_ID_NET)
+		tx_reserve = fifo->tx_fifo_size / MLXBF_TMFIFO_RESERVE_RATIO;
+	else
+		tx_reserve = 1;
+
+	sts = readq(fifo->tx_base + MLXBF_TMFIFO_TX_STS);
+	return (fifo->tx_fifo_size - tx_reserve -
+		FIELD_GET(MLXBF_TMFIFO_TX_STS__COUNT_MASK, sts));
+}
+
+/* Console Tx (move data from the output buffer into the TmFifo). */
+static void mlxbf_tmfifo_console_tx(struct mlxbf_tmfifo *fifo, int avail)
+{
+	union mlxbf_tmfifo_msg_hdr hdr;
+	struct mlxbf_tmfifo_vdev *cons;
+	unsigned long flags;
+	int size, partial;
+	void *addr;
+	u64 data;
+
+	/* Return if not enough space available. */
+	if (avail < MLXBF_TMFIFO_DATA_MIN_WORDS)
+		return;
+
+	cons = fifo->vdev[VIRTIO_ID_CONSOLE];
+	if (!cons || !cons->tx_buf)
+		return;
+
+	/* Return if no data to send. */
+	size = mlxbf_tmfifo_vdev_tx_buf_len(cons);
+	if (size == 0)
+		return;
+
+	/* Adjust the size to available space. */
+	if (size + sizeof(hdr) > avail * sizeof(u64))
+		size = avail * sizeof(u64) - sizeof(hdr);
+
+	/* Write header. */
+	hdr.u.data = 0;
+	hdr.type = VIRTIO_ID_CONSOLE;
+	hdr.len = htons(size);
+	hdr.u.data_le = cpu_to_le64(hdr.u.data);
+	writeq(hdr.u.data, fifo->tx_base + MLXBF_TMFIFO_TX_DATA);
+
+	/* Use spin-lock to protect the 'cons->tx_buf' access. */
+	spin_lock_irqsave(&fifo->spin_lock, flags);
+
+	while (size > 0) {
+		addr = cons->tx_buf + cons->tx_head;
+
+		if (cons->tx_head + sizeof(u64) <=
+		    MLXBF_TMFIFO_CONS_TX_BUF_SIZE) {
+			memcpy(&data, addr, sizeof(u64));
+		} else {
+			partial = MLXBF_TMFIFO_CONS_TX_BUF_SIZE - cons->tx_head;
+			memcpy(&data, addr, partial);
+			memcpy((u8 *)&data + partial, cons->tx_buf,
+			       sizeof(u64) - partial);
+		}
+		writeq(data, fifo->tx_base + MLXBF_TMFIFO_TX_DATA);
+
+		if (size >= sizeof(u64)) {
+			mlxbf_tmfifo_vdev_tx_buf_index_inc(&cons->tx_head,
+							   sizeof(u64));
+			size -= sizeof(u64);
+		} else {
+			mlxbf_tmfifo_vdev_tx_buf_index_inc(&cons->tx_head,
+							   size);
+			size = 0;
+		}
+	}
+
+	spin_unlock_irqrestore(&fifo->spin_lock, flags);
+}
+
+/* Rx/Tx one word in the descriptor buffer. */
+static void mlxbf_tmfifo_rxtx_word(struct mlxbf_tmfifo_vring *vring,
+				   struct vring_desc *desc,
+				   bool is_rx, int len)
+{
+	struct virtio_device *vdev = vring->vq->vdev;
+	struct mlxbf_tmfifo *fifo = vring->fifo;
+	union mlxbf_tmfifo_u64 buf;
+	void *addr;
+
+	/* Get the buffer address of this desc. */
+	addr = phys_to_virt(virtio64_to_cpu(vdev, desc->addr));
+
+	/* Read a word from FIFO for Rx. */
+	if (is_rx) {
+		buf.data = readq(fifo->rx_base + MLXBF_TMFIFO_RX_DATA);
+		buf.data = le64_to_cpu(buf.data_le);
+	}
+
+	if (vring->cur_len + sizeof(u64) <= len) {
+		/* The whole word. */
+		if (is_rx)
+			memcpy(addr + vring->cur_len, &buf.data, sizeof(u64));
+		else
+			memcpy(&buf.data, addr + vring->cur_len, sizeof(u64));
+		vring->cur_len += sizeof(u64);
+	} else {
+		/* Leftover bytes. */
+		if (is_rx)
+			memcpy(addr + vring->cur_len, &buf.data,
+			       len - vring->cur_len);
+		else
+			memcpy(&buf.data, addr + vring->cur_len,
+			       len - vring->cur_len);
+		vring->cur_len = len;
+	}
+
+	/* Write the word into FIFO for Tx. */
+	if (!is_rx) {
+		buf.data_le = cpu_to_le64(buf.data);
+		writeq(buf.data, fifo->tx_base + MLXBF_TMFIFO_TX_DATA);
+	}
+}
+
+/*
+ * Rx/Tx packet header.
+ *
+ * In Rx case, the packet might be found to belong to a different vring since
+ * the TmFifo is shared by different services. In such case, the 'vring_change'
+ * flag is set.
+ */
+static void mlxbf_tmfifo_rxtx_header(struct mlxbf_tmfifo_vring *vring,
+				     struct vring_desc *desc,
+				     bool is_rx, bool *vring_change)
+{
+	struct mlxbf_tmfifo *fifo = vring->fifo;
+	struct virtio_net_config *config;
+	union mlxbf_tmfifo_msg_hdr hdr;
+	int vdev_id, hdr_len;
+
+	/* Read/Write packet header. */
+	if (is_rx) {
+		/* Drain one word from the FIFO. */
+		hdr.u.data = readq(fifo->rx_base + MLXBF_TMFIFO_RX_DATA);
+		hdr.u.data = le64_to_cpu(hdr.u.data_le);
+
+		/* Skip the length 0 packets (keepalive). */
+		if (hdr.len == 0)
+			return;
+
+		/* Check packet type. */
+		if (hdr.type == VIRTIO_ID_NET) {
+			vdev_id = VIRTIO_ID_NET;
+			hdr_len = sizeof(struct virtio_net_hdr);
+			config = &fifo->vdev[vdev_id]->config.net;
+			if (ntohs(hdr.len) > config->mtu +
+			    MLXBF_TMFIFO_NET_L2_OVERHEAD)
+				return;
+		} else {
+			vdev_id = VIRTIO_ID_CONSOLE;
+			hdr_len = 0;
+		}
+
+		/*
+		 * Check whether the new packet still belongs to this vring.
+		 * If not, update the pkt_len of the new vring.
+		 */
+		if (vdev_id != vring->vdev_id) {
+			struct mlxbf_tmfifo_vdev *tm_dev2 = fifo->vdev[vdev_id];
+
+			if (!tm_dev2)
+				return;
+			vring->desc = desc;
+			vring = &tm_dev2->vrings[MLXBF_TMFIFO_VRING_RX];
+			*vring_change = true;
+		}
+		vring->pkt_len = ntohs(hdr.len) + hdr_len;
+	} else {
+		/* Network virtio has an extra header. */
+		hdr_len = (vring->vdev_id == VIRTIO_ID_NET) ?
+			   sizeof(struct virtio_net_hdr) : 0;
+		vring->pkt_len = mlxbf_tmfifo_get_pkt_len(vring, desc);
+		hdr.u.data = 0;
+		hdr.type = (vring->vdev_id == VIRTIO_ID_NET) ?
+			    VIRTIO_ID_NET : VIRTIO_ID_CONSOLE;
+		hdr.len = htons(vring->pkt_len - hdr_len);
+		hdr.u.data_le = cpu_to_le64(hdr.u.data);
+		writeq(hdr.u.data, fifo->tx_base + MLXBF_TMFIFO_TX_DATA);
+	}
+
+	vring->cur_len = hdr_len;
+	vring->rem_len = vring->pkt_len;
+	fifo->vring[is_rx] = vring;
+}
+
+/*
+ * Rx/Tx one descriptor.
+ *
+ * Return true to indicate more data available.
+ */
+static bool mlxbf_tmfifo_rxtx_one_desc(struct mlxbf_tmfifo_vring *vring,
+				       bool is_rx, int *avail)
+{
+	const struct vring *vr = virtqueue_get_vring(vring->vq);
+	struct mlxbf_tmfifo *fifo = vring->fifo;
+	struct virtio_device *vdev;
+	bool vring_change = false;
+	struct vring_desc *desc;
+	unsigned long flags;
+	u32 len, idx;
+
+	vdev = &fifo->vdev[vring->vdev_id]->vdev;
+
+	/* Get the descriptor of the next packet. */
+	if (!vring->desc) {
+		desc = mlxbf_tmfifo_get_next_pkt(vring, is_rx);
+		if (!desc)
+			return false;
+	} else {
+		desc = vring->desc;
+	}
+
+	/* Beginning of a packet. Start to Rx/Tx packet header. */
+	if (vring->pkt_len == 0) {
+		mlxbf_tmfifo_rxtx_header(vring, desc, is_rx, &vring_change);
+		(*avail)--;
+
+		/* Return if new packet is for another ring. */
+		if (vring_change)
+			return false;
+		goto mlxbf_tmfifo_desc_done;
+	}
+
+	/* Get the length of this desc. */
+	len = virtio32_to_cpu(vdev, desc->len);
+	if (len > vring->rem_len)
+		len = vring->rem_len;
+
+	/* Rx/Tx one word (8 bytes) if not done. */
+	if (vring->cur_len < len) {
+		mlxbf_tmfifo_rxtx_word(vring, desc, is_rx, len);
+		(*avail)--;
+	}
+
+	/* Check again whether it's done. */
+	if (vring->cur_len == len) {
+		vring->cur_len = 0;
+		vring->rem_len -= len;
+
+		/* Get the next desc on the chain. */
+		if (vring->rem_len > 0 &&
+		    (virtio16_to_cpu(vdev, desc->flags) & VRING_DESC_F_NEXT)) {
+			idx = virtio16_to_cpu(vdev, desc->next);
+			desc = &vr->desc[idx];
+			goto mlxbf_tmfifo_desc_done;
+		}
+
+		/* Done and release the pending packet. */
+		mlxbf_tmfifo_release_pending_pkt(vring);
+		desc = NULL;
+		fifo->vring[is_rx] = NULL;
+
+		/* Notify upper layer that packet is done. */
+		spin_lock_irqsave(&fifo->spin_lock, flags);
+		vring_interrupt(0, vring->vq);
+		spin_unlock_irqrestore(&fifo->spin_lock, flags);
+	}
+
+mlxbf_tmfifo_desc_done:
+	/* Save the current desc. */
+	vring->desc = desc;
+
+	return true;
+}
+
+/* Rx & Tx processing of a queue. */
+static void mlxbf_tmfifo_rxtx(struct mlxbf_tmfifo_vring *vring, bool is_rx)
+{
+	int avail = 0, devid = vring->vdev_id;
+	struct mlxbf_tmfifo *fifo;
+	bool more;
+
+	fifo = vring->fifo;
+
+	/* Return if vdev is not ready. */
+	if (!fifo->vdev[devid])
+		return;
+
+	/* Return if another vring is running. */
+	if (fifo->vring[is_rx] && fifo->vring[is_rx] != vring)
+		return;
+
+	/* Only handle console and network for now. */
+	if (WARN_ON(devid != VIRTIO_ID_NET && devid != VIRTIO_ID_CONSOLE))
+		return;
+
+	do {
+		/* Get available FIFO space. */
+		if (avail == 0) {
+			if (is_rx)
+				avail = mlxbf_tmfifo_get_rx_avail(fifo);
+			else
+				avail = mlxbf_tmfifo_get_tx_avail(fifo, devid);
+			if (avail <= 0)
+				break;
+		}
+
+		/* Console output always comes from the Tx buffer. */
+		if (!is_rx && devid == VIRTIO_ID_CONSOLE) {
+			mlxbf_tmfifo_console_tx(fifo, avail);
+			break;
+		}
+
+		/* Handle one descriptor. */
+		more = mlxbf_tmfifo_rxtx_one_desc(vring, is_rx, &avail);
+	} while (more);
+}
+
+/* Handle Rx or Tx queues. */
+static void mlxbf_tmfifo_work_rxtx(struct mlxbf_tmfifo *fifo, int queue_id,
+				   int irq_id, bool is_rx)
+{
+	struct mlxbf_tmfifo_vdev *tm_vdev;
+	struct mlxbf_tmfifo_vring *vring;
+	int i;
+
+	if (!test_and_clear_bit(irq_id, &fifo->pend_events) ||
+	    !fifo->irq_info[irq_id].irq)
+		return;
+
+	for (i = 0; i < MLXBF_TMFIFO_VDEV_MAX; i++) {
+		tm_vdev = fifo->vdev[i];
+		if (tm_vdev) {
+			vring = &tm_vdev->vrings[queue_id];
+			if (vring->vq)
+				mlxbf_tmfifo_rxtx(vring, is_rx);
+		}
+	}
+}
+
+/* Work handler for Rx and Tx case. */
+static void mlxbf_tmfifo_work_handler(struct work_struct *work)
+{
+	struct mlxbf_tmfifo *fifo;
+
+	fifo = container_of(work, struct mlxbf_tmfifo, work);
+	if (!fifo->is_ready)
+		return;
+
+	mutex_lock(&fifo->lock);
+
+	/* Tx (Send data to the TmFifo). */
+	mlxbf_tmfifo_work_rxtx(fifo, MLXBF_TMFIFO_VRING_TX,
+			       MLXBF_TM_TX_LWM_IRQ, false);
+
+	/* Rx (Receive data from the TmFifo). */
+	mlxbf_tmfifo_work_rxtx(fifo, MLXBF_TMFIFO_VRING_RX,
+			       MLXBF_TM_RX_HWM_IRQ, true);
+
+	mutex_unlock(&fifo->lock);
+}
+
+/* The notify function is called when new buffers are posted. */
+static bool mlxbf_tmfifo_virtio_notify(struct virtqueue *vq)
+{
+	struct mlxbf_tmfifo_vring *vring = vq->priv;
+	struct mlxbf_tmfifo_vdev *tm_vdev;
+	struct mlxbf_tmfifo *fifo;
+	unsigned long flags;
+
+	fifo = vring->fifo;
+
+	/*
+	 * Virtio maintains vrings in pairs, even number ring for Rx
+	 * and odd number ring for Tx.
+	 */
+	if (!(vring->index & BIT(0))) {
+		if (test_and_set_bit(MLXBF_TM_RX_HWM_IRQ, &fifo->pend_events))
+			return true;
+	} else {
+		/*
+		 * Console could make blocking call with interrupts disabled.
+		 * In such case, the vring needs to be served right away. For
+		 * other cases, just set the TX LWM bit to start Tx in the
+		 * worker handler.
+		 */
+		if (vring->vdev_id == VIRTIO_ID_CONSOLE) {
+			spin_lock_irqsave(&fifo->spin_lock, flags);
+			tm_vdev = fifo->vdev[VIRTIO_ID_CONSOLE];
+			mlxbf_tmfifo_console_output(tm_vdev, vring);
+			spin_unlock_irqrestore(&fifo->spin_lock, flags);
+		} else if (test_and_set_bit(MLXBF_TM_TX_LWM_IRQ,
+					    &fifo->pend_events)) {
+			return true;
+		}
+	}
+
+	schedule_work(&fifo->work);
+
+	return true;
+}
+
+/* Get the array of feature bits for this device. */
+static u64 mlxbf_tmfifo_virtio_get_features(struct virtio_device *vdev)
+{
+	struct mlxbf_tmfifo_vdev *tm_vdev = mlxbf_vdev_to_tmfifo(vdev);
+
+	return tm_vdev->features;
+}
+
+/* Confirm device features to use. */
+static int mlxbf_tmfifo_virtio_finalize_features(struct virtio_device *vdev)
+{
+	struct mlxbf_tmfifo_vdev *tm_vdev = mlxbf_vdev_to_tmfifo(vdev);
+
+	tm_vdev->features = vdev->features;
+
+	return 0;
+}
+
+/* Free virtqueues found by find_vqs(). */
+static void mlxbf_tmfifo_virtio_del_vqs(struct virtio_device *vdev)
+{
+	struct mlxbf_tmfifo_vdev *tm_vdev = mlxbf_vdev_to_tmfifo(vdev);
+	struct mlxbf_tmfifo_vring *vring;
+	struct virtqueue *vq;
+	int i;
+
+	for (i = 0; i < ARRAY_SIZE(tm_vdev->vrings); i++) {
+		vring = &tm_vdev->vrings[i];
+
+		/* Release the pending packet. */
+		if (vring->desc)
+			mlxbf_tmfifo_release_pending_pkt(vring);
+		vq = vring->vq;
+		if (vq) {
+			vring->vq = NULL;
+			vring_del_virtqueue(vq);
+		}
+	}
+}
+
+/* Create and initialize the virtual queues. */
+static int mlxbf_tmfifo_virtio_find_vqs(struct virtio_device *vdev,
+					unsigned int nvqs,
+					struct virtqueue *vqs[],
+					vq_callback_t *callbacks[],
+					const char * const names[],
+					const bool *ctx,
+					struct irq_affinity *desc)
+{
+	struct mlxbf_tmfifo_vdev *tm_vdev = mlxbf_vdev_to_tmfifo(vdev);
+	struct mlxbf_tmfifo_vring *vring;
+	struct virtqueue *vq;
+	int i, ret, size;
+
+	if (nvqs > ARRAY_SIZE(tm_vdev->vrings))
+		return -EINVAL;
+
+	for (i = 0; i < nvqs; ++i) {
+		if (!names[i]) {
+			ret = -EINVAL;
+			goto error;
+		}
+		vring = &tm_vdev->vrings[i];
+
+		/* zero vring */
+		size = vring_size(vring->num, vring->align);
+		memset(vring->va, 0, size);
+		vq = vring_new_virtqueue(i, vring->num, vring->align, vdev,
+					 false, false, vring->va,
+					 mlxbf_tmfifo_virtio_notify,
+					 callbacks[i], names[i]);
+		if (!vq) {
+			dev_err(&vdev->dev, "vring_new_virtqueue failed\n");
+			ret = -ENOMEM;
+			goto error;
+		}
+
+		vqs[i] = vq;
+		vring->vq = vq;
+		vq->priv = vring;
+	}
+
+	return 0;
+
+error:
+	mlxbf_tmfifo_virtio_del_vqs(vdev);
+	return ret;
+}
+
+/* Read the status byte. */
+static u8 mlxbf_tmfifo_virtio_get_status(struct virtio_device *vdev)
+{
+	struct mlxbf_tmfifo_vdev *tm_vdev = mlxbf_vdev_to_tmfifo(vdev);
+
+	return tm_vdev->status;
+}
+
+/* Write the status byte. */
+static void mlxbf_tmfifo_virtio_set_status(struct virtio_device *vdev,
+					   u8 status)
+{
+	struct mlxbf_tmfifo_vdev *tm_vdev = mlxbf_vdev_to_tmfifo(vdev);
+
+	tm_vdev->status = status;
+}
+
+/* Reset the device. Not much here for now. */
+static void mlxbf_tmfifo_virtio_reset(struct virtio_device *vdev)
+{
+	struct mlxbf_tmfifo_vdev *tm_vdev = mlxbf_vdev_to_tmfifo(vdev);
+
+	tm_vdev->status = 0;
+}
+
+/* Read the value of a configuration field. */
+static void mlxbf_tmfifo_virtio_get(struct virtio_device *vdev,
+				    unsigned int offset,
+				    void *buf,
+				    unsigned int len)
+{
+	struct mlxbf_tmfifo_vdev *tm_vdev = mlxbf_vdev_to_tmfifo(vdev);
+
+	if (offset + len > sizeof(tm_vdev->config))
+		return;
+
+	memcpy(buf, (u8 *)&tm_vdev->config + offset, len);
+}
+
+/* Write the value of a configuration field. */
+static void mlxbf_tmfifo_virtio_set(struct virtio_device *vdev,
+				    unsigned int offset,
+				    const void *buf,
+				    unsigned int len)
+{
+	struct mlxbf_tmfifo_vdev *tm_vdev = mlxbf_vdev_to_tmfifo(vdev);
+
+	if (offset + len > sizeof(tm_vdev->config))
+		return;
+
+	memcpy((u8 *)&tm_vdev->config + offset, buf, len);
+}
+
+/*
+ * Nothing to do for now. This function is needed to avoid warnings
+ * when the device is released in device_release().
+ */
+static void tmfifo_virtio_dev_release(struct device *dev)
+{
+}
+
+/* Virtio config operations. */
+static const struct virtio_config_ops mlxbf_tmfifo_virtio_config_ops = {
+	.get_features = mlxbf_tmfifo_virtio_get_features,
+	.finalize_features = mlxbf_tmfifo_virtio_finalize_features,
+	.find_vqs = mlxbf_tmfifo_virtio_find_vqs,
+	.del_vqs = mlxbf_tmfifo_virtio_del_vqs,
+	.reset = mlxbf_tmfifo_virtio_reset,
+	.set_status = mlxbf_tmfifo_virtio_set_status,
+	.get_status = mlxbf_tmfifo_virtio_get_status,
+	.get = mlxbf_tmfifo_virtio_get,
+	.set = mlxbf_tmfifo_virtio_set,
+};
+
+/* Create vdev type in a tmfifo. */
+static int mlxbf_tmfifo_create_vdev(struct device *dev,
+				    struct mlxbf_tmfifo *fifo,
+				    int vdev_id, u64 features,
+				    void *config, u32 size)
+{
+	struct mlxbf_tmfifo_vdev *tm_vdev;
+	int ret;
+
+	mutex_lock(&fifo->lock);
+
+	tm_vdev = fifo->vdev[vdev_id];
+	if (tm_vdev) {
+		dev_err(dev, "vdev %d already exists\n", vdev_id);
+		ret = -EEXIST;
+		goto fail;
+	}
+
+	tm_vdev = devm_kzalloc(dev, sizeof(*tm_vdev), GFP_KERNEL);
+	if (!tm_vdev) {
+		ret = -ENOMEM;
+		goto fail;
+	}
+
+	tm_vdev->vdev.id.device = vdev_id;
+	tm_vdev->vdev.config = &mlxbf_tmfifo_virtio_config_ops;
+	tm_vdev->vdev.dev.parent = &fifo->pdev->dev;
+	tm_vdev->vdev.dev.release = tmfifo_virtio_dev_release;
+	tm_vdev->features = features;
+	if (config)
+		memcpy(&tm_vdev->config, config, size);
+
+	if (mlxbf_tmfifo_alloc_vrings(fifo, tm_vdev)) {
+		dev_err(dev, "unable to allocate vring\n");
+		ret = -ENOMEM;
+		goto fail;
+	}
+
+	/* Allocate an output buffer for the console device. */
+	if (vdev_id == VIRTIO_ID_CONSOLE)
+		tm_vdev->tx_buf = devm_kmalloc(dev,
+					       MLXBF_TMFIFO_CONS_TX_BUF_SIZE,
+					       GFP_KERNEL);
+	fifo->vdev[vdev_id] = tm_vdev;
+
+	/* Register the virtio device. */
+	ret = register_virtio_device(&tm_vdev->vdev);
+	if (ret) {
+		dev_err(&fifo->pdev->dev, "register_virtio_device failed\n");
+		goto register_fail;
+	}
+
+	mutex_unlock(&fifo->lock);
+	return 0;
+
+register_fail:
+	mlxbf_tmfifo_free_vrings(fifo, tm_vdev);
+	fifo->vdev[vdev_id] = NULL;
+fail:
+	mutex_unlock(&fifo->lock);
+	return ret;
+}
+
+/* Delete vdev type from a tmfifo. */
+static int mlxbf_tmfifo_delete_vdev(struct mlxbf_tmfifo *fifo, int vdev_id)
+{
+	struct mlxbf_tmfifo_vdev *tm_vdev;
+
+	mutex_lock(&fifo->lock);
+
+	/* Unregister vdev. */
+	tm_vdev = fifo->vdev[vdev_id];
+	if (tm_vdev) {
+		unregister_virtio_device(&tm_vdev->vdev);
+		mlxbf_tmfifo_free_vrings(fifo, tm_vdev);
+		fifo->vdev[vdev_id] = NULL;
+	}
+
+	mutex_unlock(&fifo->lock);
+
+	return 0;
+}
+
+/* Read the configured network MAC address from efi variable. */
+static void mlxbf_tmfifo_get_cfg_mac(u8 *mac)
+{
+	efi_guid_t guid = EFI_GLOBAL_VARIABLE_GUID;
+	efi_status_t status;
+	unsigned long size;
+	u8 buf[6];
+
+	size = sizeof(buf);
+	status = efi.get_variable(mlxbf_tmfifo_efi_name, &guid, NULL, &size,
+				  buf);
+	if (status == EFI_SUCCESS && size == sizeof(buf))
+		memcpy(mac, buf, sizeof(buf));
+}
+
+/* Set TmFifo thresolds which is used to trigger interrupts. */
+static void mlxbf_tmfifo_set_threshold(struct mlxbf_tmfifo *fifo)
+{
+	u64 ctl;
+
+	/* Get Tx FIFO size and set the low/high watermark. */
+	ctl = readq(fifo->tx_base + MLXBF_TMFIFO_TX_CTL);
+	fifo->tx_fifo_size =
+		FIELD_GET(MLXBF_TMFIFO_TX_CTL__MAX_ENTRIES_MASK, ctl);
+	ctl = (ctl & ~MLXBF_TMFIFO_TX_CTL__LWM_MASK) |
+		FIELD_PREP(MLXBF_TMFIFO_TX_CTL__LWM_MASK,
+			   fifo->tx_fifo_size / 2);
+	ctl = (ctl & ~MLXBF_TMFIFO_TX_CTL__HWM_MASK) |
+		FIELD_PREP(MLXBF_TMFIFO_TX_CTL__HWM_MASK,
+			   fifo->tx_fifo_size - 1);
+	writeq(ctl, fifo->tx_base + MLXBF_TMFIFO_TX_CTL);
+
+	/* Get Rx FIFO size and set the low/high watermark. */
+	ctl = readq(fifo->rx_base + MLXBF_TMFIFO_RX_CTL);
+	fifo->rx_fifo_size =
+		FIELD_GET(MLXBF_TMFIFO_RX_CTL__MAX_ENTRIES_MASK, ctl);
+	ctl = (ctl & ~MLXBF_TMFIFO_RX_CTL__LWM_MASK) |
+		FIELD_PREP(MLXBF_TMFIFO_RX_CTL__LWM_MASK, 0);
+	ctl = (ctl & ~MLXBF_TMFIFO_RX_CTL__HWM_MASK) |
+		FIELD_PREP(MLXBF_TMFIFO_RX_CTL__HWM_MASK, 1);
+	writeq(ctl, fifo->rx_base + MLXBF_TMFIFO_RX_CTL);
+}
+
+static void mlxbf_tmfifo_cleanup(struct mlxbf_tmfifo *fifo)
+{
+	int i;
+
+	fifo->is_ready = false;
+	del_timer_sync(&fifo->timer);
+	mlxbf_tmfifo_disable_irqs(fifo);
+	cancel_work_sync(&fifo->work);
+	for (i = 0; i < MLXBF_TMFIFO_VDEV_MAX; i++)
+		mlxbf_tmfifo_delete_vdev(fifo, i);
+}
+
+/* Probe the TMFIFO. */
+static int mlxbf_tmfifo_probe(struct platform_device *pdev)
+{
+	struct virtio_net_config net_config;
+	struct mlxbf_tmfifo *fifo;
+	struct resource *res;
+	int i, ret;
+
+	fifo = devm_kzalloc(&pdev->dev, sizeof(*fifo), GFP_KERNEL);
+	if (!fifo)
+		return -ENOMEM;
+
+	spin_lock_init(&fifo->spin_lock);
+	INIT_WORK(&fifo->work, mlxbf_tmfifo_work_handler);
+	mutex_init(&fifo->lock);
+
+	/* Get the resource of the Rx FIFO. */
+	res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
+	fifo->rx_base = devm_ioremap_resource(&pdev->dev, res);
+	if (IS_ERR(fifo->rx_base))
+		return PTR_ERR(fifo->rx_base);
+
+	/* Get the resource of the Tx FIFO. */
+	res = platform_get_resource(pdev, IORESOURCE_MEM, 1);
+	fifo->tx_base = devm_ioremap_resource(&pdev->dev, res);
+	if (IS_ERR(fifo->tx_base))
+		return PTR_ERR(fifo->tx_base);
+
+	fifo->pdev = pdev;
+	platform_set_drvdata(pdev, fifo);
+
+	timer_setup(&fifo->timer, mlxbf_tmfifo_timer, 0);
+
+	for (i = 0; i < MLXBF_TM_MAX_IRQ; i++) {
+		fifo->irq_info[i].index = i;
+		fifo->irq_info[i].fifo = fifo;
+		fifo->irq_info[i].irq = platform_get_irq(pdev, i);
+		ret = devm_request_irq(&pdev->dev, fifo->irq_info[i].irq,
+				       mlxbf_tmfifo_irq_handler, 0,
+				       "tmfifo", &fifo->irq_info[i]);
+		if (ret) {
+			dev_err(&pdev->dev, "devm_request_irq failed\n");
+			fifo->irq_info[i].irq = 0;
+			return ret;
+		}
+	}
+
+	mlxbf_tmfifo_set_threshold(fifo);
+
+	/* Create the console vdev. */
+	ret = mlxbf_tmfifo_create_vdev(&pdev->dev, fifo, VIRTIO_ID_CONSOLE, 0,
+				       NULL, 0);
+	if (ret)
+		goto fail;
+
+	/* Create the network vdev. */
+	memset(&net_config, 0, sizeof(net_config));
+	net_config.mtu = ETH_DATA_LEN;
+	net_config.status = VIRTIO_NET_S_LINK_UP;
+	memcpy(net_config.mac, mlxbf_tmfifo_net_default_mac, 6);
+	mlxbf_tmfifo_get_cfg_mac(net_config.mac);
+	ret = mlxbf_tmfifo_create_vdev(&pdev->dev, fifo, VIRTIO_ID_NET,
+				       MLXBF_TMFIFO_NET_FEATURES, &net_config,
+				       sizeof(net_config));
+	if (ret)
+		goto fail;
+
+	mod_timer(&fifo->timer, jiffies + MLXBF_TMFIFO_TIMER_INTERVAL);
+
+	fifo->is_ready = true;
+	return 0;
+
+fail:
+	mlxbf_tmfifo_cleanup(fifo);
+	return ret;
+}
+
+/* Device remove function. */
+static int mlxbf_tmfifo_remove(struct platform_device *pdev)
+{
+	struct mlxbf_tmfifo *fifo = platform_get_drvdata(pdev);
+
+	mlxbf_tmfifo_cleanup(fifo);
+
+	return 0;
+}
+
+static const struct acpi_device_id mlxbf_tmfifo_acpi_match[] = {
+	{ "MLNXBF01", 0 },
+	{}
+};
+MODULE_DEVICE_TABLE(acpi, mlxbf_tmfifo_acpi_match);
+
+static struct platform_driver mlxbf_tmfifo_driver = {
+	.probe = mlxbf_tmfifo_probe,
+	.remove = mlxbf_tmfifo_remove,
+	.driver = {
+		.name = "bf-tmfifo",
+		.acpi_match_table = mlxbf_tmfifo_acpi_match,
+	},
+};
+
+module_platform_driver(mlxbf_tmfifo_driver);
+
+MODULE_DESCRIPTION("Mellanox BlueField SoC TmFifo Driver");
+MODULE_LICENSE("GPL v2");
+MODULE_AUTHOR("Mellanox Technologies");
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [PATCH v10] platform/mellanox: Add TmFifo driver for Mellanox BlueField Soc
  2019-02-28 15:51 ` [PATCH v10] " Liming Sun
@ 2019-03-05 15:34   ` Andy Shevchenko
  2019-03-06 20:00     ` Liming Sun
  0 siblings, 1 reply; 30+ messages in thread
From: Andy Shevchenko @ 2019-03-05 15:34 UTC (permalink / raw)
  To: Liming Sun
  Cc: David Woods, Andy Shevchenko, Darren Hart, Vadim Pasternak,
	Linux Kernel Mailing List, Platform Driver

On Thu, Feb 28, 2019 at 5:51 PM Liming Sun <lsun@mellanox.com> wrote:
>
> This commit adds the TmFifo platform driver for Mellanox BlueField
> Soc. TmFifo is a shared FIFO which enables external host machine
> to exchange data with the SoC via USB or PCIe. The driver is based
> on virtio framework and has console and network access enabled.

Thank you for an update.

Unfortunately more work is needed. My comments below.

> +#define MLXBF_TMFIFO_TX_STS__COUNT_RMASK               GENMASK(8, 0)
> +#define MLXBF_TMFIFO_TX_STS__COUNT_MASK                        GENMASK(8, 0)

> +#define MLXBF_TMFIFO_TX_CTL__LWM_RMASK                 GENMASK(7, 0)
> +#define MLXBF_TMFIFO_TX_CTL__LWM_MASK                  GENMASK(7, 0)

> +#define MLXBF_TMFIFO_TX_CTL__HWM_RMASK                 GENMASK(7, 0)
> +#define MLXBF_TMFIFO_TX_CTL__HWM_MASK                  GENMASK(15, 8)

> +#define MLXBF_TMFIFO_TX_CTL__MAX_ENTRIES_RMASK         GENMASK(8, 0)
> +#define MLXBF_TMFIFO_TX_CTL__MAX_ENTRIES_MASK          GENMASK_ULL(40, 32)

> +#define MLXBF_TMFIFO_RX_STS__COUNT_RMASK               GENMASK(8, 0)
> +#define MLXBF_TMFIFO_RX_STS__COUNT_MASK                        GENMASK(8, 0)

> +#define MLXBF_TMFIFO_RX_CTL__LWM_RMASK                 GENMASK(7, 0)
> +#define MLXBF_TMFIFO_RX_CTL__LWM_MASK                  GENMASK(7, 0)

> +#define MLXBF_TMFIFO_RX_CTL__HWM_RMASK                 GENMASK(7, 0)
> +#define MLXBF_TMFIFO_RX_CTL__HWM_MASK                  GENMASK(15, 8)

> +#define MLXBF_TMFIFO_RX_CTL__MAX_ENTRIES_RMASK         GENMASK(8, 0)
> +#define MLXBF_TMFIFO_RX_CTL__MAX_ENTRIES_MASK          GENMASK_ULL(40, 32)

Since two of them have _ULL suffix I'm wondering if you have checked
for side effects on the rest, i.e. if you operate with 64-bit variable
and use something like ~MLXBF_TMFIFO_RX_CTL__MAX_ENTRIES_RMASK, it may
give you interesting results.

> +#define MLXBF_TMFIFO_TIMER_INTERVAL            (HZ / 10)

> +/**
> + * mlxbf_tmfifo_u64 - Union of 64-bit data
> + * @data - 64-bit data in host byte order
> + * @data_le - 64-bit data in little-endian byte order
> + *
> + * It's expected to send 64-bit little-endian value (__le64) into the TmFifo.
> + * readq() and writeq() expect u64 instead. A union structure is used here
> + * to workaround the explicit casting usage like writeq(*(u64 *)&data_le).
> + */

How do you know what readq()/writeq() does with the data? Is it on all
architectures?
How the endianess conversion affects the actual data?

> +union mlxbf_tmfifo_u64 {
> +       u64 data;
> +       __le64 data_le;
> +};

> +/*
> + * Default MAC.
> + * This MAC address will be read from EFI persistent variable if configured.
> + * It can also be reconfigured with standard Linux tools.
> + */
> +static u8 mlxbf_tmfifo_net_default_mac[6] = {
> +       0x00, 0x1A, 0xCA, 0xFF, 0xFF, 0x01};

> +#define mlxbf_vdev_to_tmfifo(dev)      \
> +       container_of(dev, struct mlxbf_tmfifo_vdev, vdev)

One line?

> +/* Return the consumed Tx buffer space. */
> +static int mlxbf_tmfifo_vdev_tx_buf_len(struct mlxbf_tmfifo_vdev *tm_vdev)
> +{
> +       int len;
> +
> +       if (tm_vdev->tx_tail >= tm_vdev->tx_head)
> +               len = tm_vdev->tx_tail - tm_vdev->tx_head;
> +       else
> +               len = MLXBF_TMFIFO_CONS_TX_BUF_SIZE - tm_vdev->tx_head +
> +                       tm_vdev->tx_tail;
> +       return len;
> +}

Is this custom implementation of some kind of circ_buf?

> +/* Allocate vrings for the fifo. */
> +static int mlxbf_tmfifo_alloc_vrings(struct mlxbf_tmfifo *fifo,
> +                                    struct mlxbf_tmfifo_vdev *tm_vdev)
> +{
> +       struct mlxbf_tmfifo_vring *vring;
> +       struct device *dev;
> +       dma_addr_t dma;
> +       int i, size;
> +       void *va;
> +
> +       for (i = 0; i < ARRAY_SIZE(tm_vdev->vrings); i++) {
> +               vring = &tm_vdev->vrings[i];
> +               vring->fifo = fifo;
> +               vring->num = MLXBF_TMFIFO_VRING_SIZE;
> +               vring->align = SMP_CACHE_BYTES;
> +               vring->index = i;
> +               vring->vdev_id = tm_vdev->vdev.id.device;
> +               dev = &tm_vdev->vdev.dev;
> +
> +               size = vring_size(vring->num, vring->align);
> +               va = dma_alloc_coherent(dev->parent, size, &dma, GFP_KERNEL);
> +               if (!va) {

> +                       dev_err(dev->parent, "dma_alloc_coherent failed\n");

And how do you clean previously allocated items?

> +                       return -ENOMEM;
> +               }
> +
> +               vring->va = va;
> +               vring->dma = dma;
> +       }
> +
> +       return 0;
> +}

> +/* Disable interrupts of the fifo device. */
> +static void mlxbf_tmfifo_disable_irqs(struct mlxbf_tmfifo *fifo)
> +{
> +       int i, irq;
> +
> +       for (i = 0; i < MLXBF_TM_MAX_IRQ; i++) {
> +               irq = fifo->irq_info[i].irq;

> +               if (irq) {

I don't think this check is needed if you can guarantee that it has no
staled records.

> +                       fifo->irq_info[i].irq = 0;
> +                       disable_irq(irq);
> +               }
> +       }
> +}

> +/* Get the number of available words in the TmFifo for sending. */
> +static int mlxbf_tmfifo_get_tx_avail(struct mlxbf_tmfifo *fifo, int vdev_id)
> +{
> +       int tx_reserve;
> +       u64 sts;
> +
> +       /* Reserve some room in FIFO for console messages. */
> +       if (vdev_id == VIRTIO_ID_NET)
> +               tx_reserve = fifo->tx_fifo_size / MLXBF_TMFIFO_RESERVE_RATIO;
> +       else
> +               tx_reserve = 1;
> +
> +       sts = readq(fifo->tx_base + MLXBF_TMFIFO_TX_STS);

> +       return (fifo->tx_fifo_size - tx_reserve -
> +               FIELD_GET(MLXBF_TMFIFO_TX_STS__COUNT_MASK, sts));

Redundant parens.
Moreover, consider

u32 count; // or whatever suits for FIELD_GET().
...

sts = readq(...);
count = FIELD_GET(...);
return ...;

> +}

> +       while (size > 0) {
> +               addr = cons->tx_buf + cons->tx_head;
> +
> +               if (cons->tx_head + sizeof(u64) <=
> +                   MLXBF_TMFIFO_CONS_TX_BUF_SIZE) {
> +                       memcpy(&data, addr, sizeof(u64));
> +               } else {
> +                       partial = MLXBF_TMFIFO_CONS_TX_BUF_SIZE - cons->tx_head;
> +                       memcpy(&data, addr, partial);

> +                       memcpy((u8 *)&data + partial, cons->tx_buf,
> +                              sizeof(u64) - partial);

Unaligned access?!

> +               }

> +               buf.data = readq(fifo->rx_base + MLXBF_TMFIFO_RX_DATA);
> +               buf.data = le64_to_cpu(buf.data_le);

Are you sure this is correct?
How did you test this on BE architectures?

> +       tm_vdev = devm_kzalloc(dev, sizeof(*tm_vdev), GFP_KERNEL);

Is it appropriate use of devm_* ?

> +       if (!tm_vdev) {
> +               ret = -ENOMEM;
> +               goto fail;
> +       }

> +/* Read the configured network MAC address from efi variable. */
> +static void mlxbf_tmfifo_get_cfg_mac(u8 *mac)
> +{
> +       efi_guid_t guid = EFI_GLOBAL_VARIABLE_GUID;
> +       efi_status_t status;
> +       unsigned long size;

> +       u8 buf[6];

ETH_ALEN ?

> +
> +       size = sizeof(buf);

Ditto.

> +       status = efi.get_variable(mlxbf_tmfifo_efi_name, &guid, NULL, &size,
> +                                 buf);

> +       if (status == EFI_SUCCESS && size == sizeof(buf))

Ditto.

> +               memcpy(mac, buf, sizeof(buf));

ether_addr_copy().

> +}

> +       memcpy(net_config.mac, mlxbf_tmfifo_net_default_mac, 6);

ether_addr_copy()...

> +       mlxbf_tmfifo_get_cfg_mac(net_config.mac);

... but actually above should be part of this function.

-- 
With Best Regards,
Andy Shevchenko

^ permalink raw reply	[flat|nested] 30+ messages in thread

* RE: [PATCH v10] platform/mellanox: Add TmFifo driver for Mellanox BlueField Soc
  2019-03-05 15:34   ` Andy Shevchenko
@ 2019-03-06 20:00     ` Liming Sun
  2019-03-08 14:44       ` Liming Sun
  0 siblings, 1 reply; 30+ messages in thread
From: Liming Sun @ 2019-03-06 20:00 UTC (permalink / raw)
  To: Andy Shevchenko
  Cc: David Woods, Andy Shevchenko, Darren Hart, Vadim Pasternak,
	Linux Kernel Mailing List, Platform Driver

Thanks Andy! Please see my response below. If no further comments, I'll try to post v11 after more testing.

Regards,
Liming

> -----Original Message-----
> From: Andy Shevchenko <andy.shevchenko@gmail.com>
> Sent: Tuesday, March 5, 2019 10:34 AM
> To: Liming Sun <lsun@mellanox.com>
> Cc: David Woods <dwoods@mellanox.com>; Andy Shevchenko <andy@infradead.org>; Darren Hart <dvhart@infradead.org>; Vadim
> Pasternak <vadimp@mellanox.com>; Linux Kernel Mailing List <linux-kernel@vger.kernel.org>; Platform Driver <platform-driver-
> x86@vger.kernel.org>
> Subject: Re: [PATCH v10] platform/mellanox: Add TmFifo driver for Mellanox BlueField Soc
> 
> On Thu, Feb 28, 2019 at 5:51 PM Liming Sun <lsun@mellanox.com> wrote:
> >
> > This commit adds the TmFifo platform driver for Mellanox BlueField
> > Soc. TmFifo is a shared FIFO which enables external host machine
> > to exchange data with the SoC via USB or PCIe. The driver is based
> > on virtio framework and has console and network access enabled.
> 
> Thank you for an update.
> 
> Unfortunately more work is needed. My comments below.
> 
> > +#define MLXBF_TMFIFO_TX_STS__COUNT_RMASK               GENMASK(8, 0)
> > +#define MLXBF_TMFIFO_TX_STS__COUNT_MASK                        GENMASK(8, 0)
> 
> > +#define MLXBF_TMFIFO_TX_CTL__LWM_RMASK                 GENMASK(7, 0)
> > +#define MLXBF_TMFIFO_TX_CTL__LWM_MASK                  GENMASK(7, 0)
> 
> > +#define MLXBF_TMFIFO_TX_CTL__HWM_RMASK                 GENMASK(7, 0)
> > +#define MLXBF_TMFIFO_TX_CTL__HWM_MASK                  GENMASK(15, 8)
> 
> > +#define MLXBF_TMFIFO_TX_CTL__MAX_ENTRIES_RMASK         GENMASK(8, 0)
> > +#define MLXBF_TMFIFO_TX_CTL__MAX_ENTRIES_MASK          GENMASK_ULL(40, 32)
> 
> > +#define MLXBF_TMFIFO_RX_STS__COUNT_RMASK               GENMASK(8, 0)
> > +#define MLXBF_TMFIFO_RX_STS__COUNT_MASK                        GENMASK(8, 0)
> 
> > +#define MLXBF_TMFIFO_RX_CTL__LWM_RMASK                 GENMASK(7, 0)
> > +#define MLXBF_TMFIFO_RX_CTL__LWM_MASK                  GENMASK(7, 0)
> 
> > +#define MLXBF_TMFIFO_RX_CTL__HWM_RMASK                 GENMASK(7, 0)
> > +#define MLXBF_TMFIFO_RX_CTL__HWM_MASK                  GENMASK(15, 8)
> 
> > +#define MLXBF_TMFIFO_RX_CTL__MAX_ENTRIES_RMASK         GENMASK(8, 0)
> > +#define MLXBF_TMFIFO_RX_CTL__MAX_ENTRIES_MASK          GENMASK_ULL(40, 32)
> 
> Since two of them have _ULL suffix I'm wondering if you have checked
> for side effects on the rest, i.e. if you operate with 64-bit variable
> and use something like ~MLXBF_TMFIFO_RX_CTL__MAX_ENTRIES_RMASK, it may
> give you interesting results.

The running system on the SoC is arm64 where BITS_PER_LONG and 
BITS_PER_LONG_LONG have the same value. In such case, the two macros appears
to be the same. But you're right, I should use GENMASK_ULL() to be consistent
and more correctly, just in case the "CONFIG_64BIT" is not defined somehow.

Will update it in v11.

> 
> > +#define MLXBF_TMFIFO_TIMER_INTERVAL            (HZ / 10)
> 
> > +/**
> > + * mlxbf_tmfifo_u64 - Union of 64-bit data
> > + * @data - 64-bit data in host byte order
> > + * @data_le - 64-bit data in little-endian byte order
> > + *
> > + * It's expected to send 64-bit little-endian value (__le64) into the TmFifo.
> > + * readq() and writeq() expect u64 instead. A union structure is used here
> > + * to workaround the explicit casting usage like writeq(*(u64 *)&data_le).
> > + */
> 
> How do you know what readq()/writeq() does with the data? Is it on all
> architectures?
> How the endianess conversion affects the actual data?

The SoC runs arm64 and supports little endian for now. The FIFO has two sides,
one side is the SoC, the other side is extern host machine which could 
access the FIFO via USB or PCIe. The rule is that the 'byte stream' will 
keep the same when one side write 8 bytes and the other side reads 
the 8 bytes. So as long as both sides have agreement on the byte-order
it should be fine.

After double check the arm64 readq()/writeq() implementation, it appears
that these APIs already does cpu_to_le64() and le64_to_cpu()
conversion. There's actually no need to make another conversion 
(and shouldn't do it). I'll remove these conversions in v11. The code will
look much cleaner.

> 
> > +union mlxbf_tmfifo_u64 {
> > +       u64 data;
> > +       __le64 data_le;
> > +};
> 
> > +/*
> > + * Default MAC.
> > + * This MAC address will be read from EFI persistent variable if configured.
> > + * It can also be reconfigured with standard Linux tools.
> > + */
> > +static u8 mlxbf_tmfifo_net_default_mac[6] = {
> > +       0x00, 0x1A, 0xCA, 0xFF, 0xFF, 0x01};
> 
> > +#define mlxbf_vdev_to_tmfifo(dev)      \
> > +       container_of(dev, struct mlxbf_tmfifo_vdev, vdev)
> 
> One line?

Couldn't fit it into one line within 80 characters.
(Please correct me if you meant single line even exceeding 80 chracters).

> 
> > +/* Return the consumed Tx buffer space. */
> > +static int mlxbf_tmfifo_vdev_tx_buf_len(struct mlxbf_tmfifo_vdev *tm_vdev)
> > +{
> > +       int len;
> > +
> > +       if (tm_vdev->tx_tail >= tm_vdev->tx_head)
> > +               len = tm_vdev->tx_tail - tm_vdev->tx_head;
> > +       else
> > +               len = MLXBF_TMFIFO_CONS_TX_BUF_SIZE - tm_vdev->tx_head +
> > +                       tm_vdev->tx_tail;
> > +       return len;
> > +}
> 
> Is this custom implementation of some kind of circ_buf?

Yes. I'll try if I could re-use the circ_buf structure and update it in v11.

> 
> > +/* Allocate vrings for the fifo. */
> > +static int mlxbf_tmfifo_alloc_vrings(struct mlxbf_tmfifo *fifo,
> > +                                    struct mlxbf_tmfifo_vdev *tm_vdev)
> > +{
> > +       struct mlxbf_tmfifo_vring *vring;
> > +       struct device *dev;
> > +       dma_addr_t dma;
> > +       int i, size;
> > +       void *va;
> > +
> > +       for (i = 0; i < ARRAY_SIZE(tm_vdev->vrings); i++) {
> > +               vring = &tm_vdev->vrings[i];
> > +               vring->fifo = fifo;
> > +               vring->num = MLXBF_TMFIFO_VRING_SIZE;
> > +               vring->align = SMP_CACHE_BYTES;
> > +               vring->index = i;
> > +               vring->vdev_id = tm_vdev->vdev.id.device;
> > +               dev = &tm_vdev->vdev.dev;
> > +
> > +               size = vring_size(vring->num, vring->align);
> > +               va = dma_alloc_coherent(dev->parent, size, &dma, GFP_KERNEL);
> > +               if (!va) {
> 
> > +                       dev_err(dev->parent, "dma_alloc_coherent failed\n");
> 
> And how do you clean previously allocated items?

Fixed. Check the return value of mlxbf_tmfifo_alloc_vrings() and goto 
'register_fail' (probably change to a better name) instead of 'fail'. 
In such case the mlxbf_tmfifo_free_vrings() will be called to clean up
all allocated vrings.

> 
> > +                       return -ENOMEM;
> > +               }
> > +
> > +               vring->va = va;
> > +               vring->dma = dma;
> > +       }
> > +
> > +       return 0;
> > +}
> 
> > +/* Disable interrupts of the fifo device. */
> > +static void mlxbf_tmfifo_disable_irqs(struct mlxbf_tmfifo *fifo)
> > +{
> > +       int i, irq;
> > +
> > +       for (i = 0; i < MLXBF_TM_MAX_IRQ; i++) {
> > +               irq = fifo->irq_info[i].irq;
> 
> > +               if (irq) {
> 
> I don't think this check is needed if you can guarantee that it has no
> staled records.

Yes, it's not needed any more according to the current code.
Will remove it in v11.

> 
> > +                       fifo->irq_info[i].irq = 0;
> > +                       disable_irq(irq);
> > +               }
> > +       }
> > +}
> 
> > +/* Get the number of available words in the TmFifo for sending. */
> > +static int mlxbf_tmfifo_get_tx_avail(struct mlxbf_tmfifo *fifo, int vdev_id)
> > +{
> > +       int tx_reserve;
> > +       u64 sts;
> > +
> > +       /* Reserve some room in FIFO for console messages. */
> > +       if (vdev_id == VIRTIO_ID_NET)
> > +               tx_reserve = fifo->tx_fifo_size / MLXBF_TMFIFO_RESERVE_RATIO;
> > +       else
> > +               tx_reserve = 1;
> > +
> > +       sts = readq(fifo->tx_base + MLXBF_TMFIFO_TX_STS);
> 
> > +       return (fifo->tx_fifo_size - tx_reserve -
> > +               FIELD_GET(MLXBF_TMFIFO_TX_STS__COUNT_MASK, sts));
> 
> Redundant parens.
> Moreover, consider
> 
> u32 count; // or whatever suits for FIELD_GET().
> ...
> 
> sts = readq(...);
> count = FIELD_GET(...);
> return ...;

Will update in v11.

> 
> > +}
> 
> > +       while (size > 0) {
> > +               addr = cons->tx_buf + cons->tx_head;
> > +
> > +               if (cons->tx_head + sizeof(u64) <=
> > +                   MLXBF_TMFIFO_CONS_TX_BUF_SIZE) {
> > +                       memcpy(&data, addr, sizeof(u64));
> > +               } else {
> > +                       partial = MLXBF_TMFIFO_CONS_TX_BUF_SIZE - cons->tx_head;
> > +                       memcpy(&data, addr, partial);
> 
> > +                       memcpy((u8 *)&data + partial, cons->tx_buf,
> > +                              sizeof(u64) - partial);
> 
> Unaligned access?!

The code here is to build and copy 8 bytes from the buffer into the 'data'
variable. The source could be unaligned. For example, 3 bytes are at the
end of the buffer and 5 bytes are at the beginning of the buffer. memcpy()
is used to do byte-stream copy which seems ok. Please correct me if
I misunderstand the comment.

> 
> > +               }
> 
> > +               buf.data = readq(fifo->rx_base + MLXBF_TMFIFO_RX_DATA);
> > +               buf.data = le64_to_cpu(buf.data_le);
> 
> Are you sure this is correct?
> How did you test this on BE architectures?

Thanks for the comment! Same as above, the conversion is not really needed. 
I'll remove them in v11. As for testing, we only have arm64 little-endian Linux
running on the SoC. This conversion doesn't make much difference for the SoC.
As for BE architecture, we mainly verify the other side of the FIFO, which is the
external host like using ppc64.

> 
> > +       tm_vdev = devm_kzalloc(dev, sizeof(*tm_vdev), GFP_KERNEL);
> 
> Is it appropriate use of devm_* ?

This is SoC, the device won't be closed or detached. The only case is when
the driver is unloaded. So it appears ok to use devm_kzalloc() since it's
allocated during probe() and released during module unload . Please
correct me if I misunderstand it.

> 
> > +       if (!tm_vdev) {
> > +               ret = -ENOMEM;
> > +               goto fail;
> > +       }
> 
> > +/* Read the configured network MAC address from efi variable. */
> > +static void mlxbf_tmfifo_get_cfg_mac(u8 *mac)
> > +{
> > +       efi_guid_t guid = EFI_GLOBAL_VARIABLE_GUID;
> > +       efi_status_t status;
> > +       unsigned long size;
> 
> > +       u8 buf[6];
> 
> ETH_ALEN ?

Will update it in v11

> 
> > +
> > +       size = sizeof(buf);
> 
> Ditto.

Will update it in v11

> 
> > +       status = efi.get_variable(mlxbf_tmfifo_efi_name, &guid, NULL, &size,
> > +                                 buf);
> 
> > +       if (status == EFI_SUCCESS && size == sizeof(buf))
> 
> Ditto.

Will update it in v11

> 
> > +               memcpy(mac, buf, sizeof(buf));
> 
> ether_addr_copy().

Will update it in v11

> 
> > +}
> 
> > +       memcpy(net_config.mac, mlxbf_tmfifo_net_default_mac, 6);
> 
> ether_addr_copy()...
> 
> > +       mlxbf_tmfifo_get_cfg_mac(net_config.mac);
> 
> ... but actually above should be part of this function.

Will update it in v11

> 
> --
> With Best Regards,
> Andy Shevchenko

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH v11] platform/mellanox: Add TmFifo driver for Mellanox BlueField Soc
       [not found] <b143b40446c1870fb8d422b364ead95d54552be9.1527264077.git.lsun@mellanox.com>
                   ` (4 preceding siblings ...)
  2019-02-28 15:51 ` [PATCH v10] " Liming Sun
@ 2019-03-08 14:41 ` Liming Sun
  2019-03-26 21:13 ` Liming Sun
                   ` (5 subsequent siblings)
  11 siblings, 0 replies; 30+ messages in thread
From: Liming Sun @ 2019-03-08 14:41 UTC (permalink / raw)
  To: David Woods, Andy Shevchenko, Darren Hart, Vadim Pasternak
  Cc: Liming Sun, linux-kernel, platform-driver-x86

This commit adds the TmFifo platform driver for Mellanox BlueField
Soc. TmFifo is a shared FIFO which enables external host machine
to exchange data with the SoC via USB or PCIe. The driver is based
on virtio framework and has console and network access enabled.

Reviewed-by: Vadim Pasternak <vadimp@mellanox.com>
Signed-off-by: Liming Sun <lsun@mellanox.com>
---
v10->v11:
    Fixes for comments from Andy:
    - Use GENMASK_ULL() instead of GENMASK() in mlxbf-tmfifo-regs.h
    - Removed the cpu_to_le64()/le64_to_cpu() conversion since
      readq()/writeq() already takes care of it.
    - Remove the "if (irq)" check in mlxbf_tmfifo_disable_irqs().
    - Add "u32 count" temp variable in mlxbf_tmfifo_get_tx_avail().
    - Clean up mlxbf_tmfifo_get_cfg_mac(), use ETH_ALEN instead of
      value 6.
    - Change the tx_buf to use Linux existing 'struct circ_buf'.
    Comment not applied:
    - "Change macro mlxbf_vdev_to_tmfifo() to one line"
      Couldn't fit in one line with 80 chracters
    - "Is it appropriate use of devm_* for 'tm_vdev = devm_kzalloc'"
      This is SoC, the device won't be closed or detached.
      The only case is when the driver is unloaded. So it appears
      ok to use devm_kzalloc() since it's allocated during probe()
      and released during module unload.
    Comments from Vadim: OK
v9->v10:
    Fixes for comments from Andy:
    - Use devm_ioremap_resource() instead of devm_ioremap().
    - Use kernel-doc comments.
    - Keep Makefile contents sorted.
    - Use same fixed format for offsets.
    - Use SZ_1K/SZ_32K instead of 1024/23*1024.
    - Remove unnecessary comments.
    - Use one style for max numbers.
    - More comments for mlxbf_tmfifo_vdev and mlxbf_tmfifo_data_64bit.
    - Use globally defined MTU instead of new definition.
    - Remove forward declaration of mlxbf_tmfifo_remove().
    - Remove PAGE_ALIGN() for dma_alloc_coherent)().
    - Remove the cast of "struct vring *".
    - Check return result of test_and_set_bit().
    - Add a macro mlxbt_vdev_to_tmfifo().
    - Several other minor coding style comments.
    Comment not applied:
    - "Shouldn't be rather helper in EFI lib in kernel"
      Looks like efi.get_variable() is the way I found in the kernel
      tree.
    - "this one is not protected anyhow? Potential race condition"
      In mlxbf_tmfifo_console_tx(), the spin-lock is used to protect the
      'tx_buf' only, not the FIFO writes. So there is no race condition.
    - "Is __packed needed in mlxbf_tmfifo_msg_hdr".
      Yes, it is needed to make sure the structure is 8 bytes.
    Fixes for comments from Vadim:
    - Use tab in mlxbf-tmfifo-regs.h
    - Use kernel-doc comments for struct mlxbf_tmfifo_msg_hdr and
      mlxbf_tmfifo_irq_info as well.
    - Use _MAX instead of _CNT in the macro definition to be consistent.
    - Fix the MODULE_LICENSE.
    - Use BIT_ULL() instead of BIT().
    - Remove argument of 'avail' for mlxbf_tmfifo_rxtx_header() and
      mlxbf_tmfifo_rxtx_word()
    - Revise logic in mlxbf_tmfifo_rxtx_one_desc() to remove the
      WARN_ON().
    - Change "union mlxbf_tmfifo_u64 u" to "union mlxbf_tmfifo_u64 buf"
      in mlxbf_tmfifo_rxtx_word().
    - Change date type of vring_change from 'int' to 'bool'.
    - Remove the blank lines after Signed-off.
    - Don’t use declaration in the middle.
    - Make the network header initialization in some more elegant way.
    - Change label done to mlxbf_tmfifo_desc_done.
    - Remove some unnecessary comments, and several other misc coding
      style comments.
    - Simplify code logic in mlxbf_tmfifo_virtio_notify()
    New changes by Liming:
    - Simplify the Rx/Tx function arguments to make it more readable.
v8->v9:
    Fixes for comments from Andy:
    - Use modern devm_xxx() API instead.
    Fixes for comments from Vadim:
    - Split the Rx/Tx function into smaller funcitons.
    - File name, copyright information.
    - Function and variable name conversion.
    - Local variable and indent coding styles.
    - Remove unnecessary 'inline' declarations.
    - Use devm_xxx() APIs.
    - Move the efi_char16_t MAC address definition to global.
    - Fix warnings reported by 'checkpatch --strict'.
    - Fix warnings reported by 'make CF="-D__CHECK_ENDIAN__"'.
    - Change select VIRTIO_xxx to depends on  VIRTIO_ in Kconfig.
    - Merge mlxbf_tmfifo_vdev_tx_buf_push() and
      mlxbf_tmfifo_vdev_tx_buf_pop().
    - Add union to avoid casting between __le64 and u64.
    - Several other misc coding style comments.
    New changes by Liming:
    - Removed the DT binding documentation since only ACPI is
      supported for now by UEFI on the SoC.
v8: Re-submit under drivers/platform/mellanox for the target-side
    platform driver only.
v7: Added host side drivers into the same patch set.
v5~v6: Coding style fix.
v1~v4: Initial version for directory drivers/soc/mellanox.
---
 drivers/platform/mellanox/Kconfig             |   12 +-
 drivers/platform/mellanox/Makefile            |    1 +
 drivers/platform/mellanox/mlxbf-tmfifo-regs.h |   63 ++
 drivers/platform/mellanox/mlxbf-tmfifo.c      | 1286 +++++++++++++++++++++++++
 4 files changed, 1361 insertions(+), 1 deletion(-)
 create mode 100644 drivers/platform/mellanox/mlxbf-tmfifo-regs.h
 create mode 100644 drivers/platform/mellanox/mlxbf-tmfifo.c

diff --git a/drivers/platform/mellanox/Kconfig b/drivers/platform/mellanox/Kconfig
index cd8a908..530fe7e 100644
--- a/drivers/platform/mellanox/Kconfig
+++ b/drivers/platform/mellanox/Kconfig
@@ -5,7 +5,7 @@
 
 menuconfig MELLANOX_PLATFORM
 	bool "Platform support for Mellanox hardware"
-	depends on X86 || ARM || COMPILE_TEST
+	depends on X86 || ARM || ARM64 || COMPILE_TEST
 	---help---
 	  Say Y here to get to see options for platform support for
 	  Mellanox systems. This option alone does not add any kernel code.
@@ -34,4 +34,14 @@ config MLXREG_IO
 	  to system resets operation, system reset causes monitoring and some
 	  kinds of mux selection.
 
+config MLXBF_TMFIFO
+	tristate "Mellanox BlueField SoC TmFifo platform driver"
+	depends on ARM64
+	depends on ACPI
+	depends on VIRTIO_CONSOLE && VIRTIO_NET
+	help
+	  Say y here to enable TmFifo support. The TmFifo driver provides
+          platform driver support for the TmFifo which supports console
+          and networking based on the virtio framework.
+
 endif # MELLANOX_PLATFORM
diff --git a/drivers/platform/mellanox/Makefile b/drivers/platform/mellanox/Makefile
index 57074d9c..a229bda1 100644
--- a/drivers/platform/mellanox/Makefile
+++ b/drivers/platform/mellanox/Makefile
@@ -3,5 +3,6 @@
 # Makefile for linux/drivers/platform/mellanox
 # Mellanox Platform-Specific Drivers
 #
+obj-$(CONFIG_MLXBF_TMFIFO)	+= mlxbf-tmfifo.o
 obj-$(CONFIG_MLXREG_HOTPLUG)	+= mlxreg-hotplug.o
 obj-$(CONFIG_MLXREG_IO) += mlxreg-io.o
diff --git a/drivers/platform/mellanox/mlxbf-tmfifo-regs.h b/drivers/platform/mellanox/mlxbf-tmfifo-regs.h
new file mode 100644
index 0000000..e4f0d2e
--- /dev/null
+++ b/drivers/platform/mellanox/mlxbf-tmfifo-regs.h
@@ -0,0 +1,63 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Copyright (c) 2019, Mellanox Technologies. All rights reserved.
+ */
+
+#ifndef __MLXBF_TMFIFO_REGS_H__
+#define __MLXBF_TMFIFO_REGS_H__
+
+#include <linux/types.h>
+#include <linux/bits.h>
+
+#define MLXBF_TMFIFO_TX_DATA				0x00
+#define MLXBF_TMFIFO_TX_STS				0x08
+#define MLXBF_TMFIFO_TX_STS__LENGTH			0x0001
+#define MLXBF_TMFIFO_TX_STS__COUNT_SHIFT		0
+#define MLXBF_TMFIFO_TX_STS__COUNT_WIDTH		9
+#define MLXBF_TMFIFO_TX_STS__COUNT_RESET_VAL		0
+#define MLXBF_TMFIFO_TX_STS__COUNT_RMASK		GENMASK_ULL(8, 0)
+#define MLXBF_TMFIFO_TX_STS__COUNT_MASK			GENMASK_ULL(8, 0)
+#define MLXBF_TMFIFO_TX_CTL				0x10
+#define MLXBF_TMFIFO_TX_CTL__LENGTH			0x0001
+#define MLXBF_TMFIFO_TX_CTL__LWM_SHIFT			0
+#define MLXBF_TMFIFO_TX_CTL__LWM_WIDTH			8
+#define MLXBF_TMFIFO_TX_CTL__LWM_RESET_VAL		128
+#define MLXBF_TMFIFO_TX_CTL__LWM_RMASK			GENMASK_ULL(7, 0)
+#define MLXBF_TMFIFO_TX_CTL__LWM_MASK			GENMASK_ULL(7, 0)
+#define MLXBF_TMFIFO_TX_CTL__HWM_SHIFT			8
+#define MLXBF_TMFIFO_TX_CTL__HWM_WIDTH			8
+#define MLXBF_TMFIFO_TX_CTL__HWM_RESET_VAL		128
+#define MLXBF_TMFIFO_TX_CTL__HWM_RMASK			GENMASK_ULL(7, 0)
+#define MLXBF_TMFIFO_TX_CTL__HWM_MASK			GENMASK_ULL(15, 8)
+#define MLXBF_TMFIFO_TX_CTL__MAX_ENTRIES_SHIFT		32
+#define MLXBF_TMFIFO_TX_CTL__MAX_ENTRIES_WIDTH		9
+#define MLXBF_TMFIFO_TX_CTL__MAX_ENTRIES_RESET_VAL	256
+#define MLXBF_TMFIFO_TX_CTL__MAX_ENTRIES_RMASK		GENMASK_ULL(8, 0)
+#define MLXBF_TMFIFO_TX_CTL__MAX_ENTRIES_MASK		GENMASK_ULL(40, 32)
+#define MLXBF_TMFIFO_RX_DATA				0x00
+#define MLXBF_TMFIFO_RX_STS				0x08
+#define MLXBF_TMFIFO_RX_STS__LENGTH			0x0001
+#define MLXBF_TMFIFO_RX_STS__COUNT_SHIFT		0
+#define MLXBF_TMFIFO_RX_STS__COUNT_WIDTH		9
+#define MLXBF_TMFIFO_RX_STS__COUNT_RESET_VAL		0
+#define MLXBF_TMFIFO_RX_STS__COUNT_RMASK		GENMASK_ULL(8, 0)
+#define MLXBF_TMFIFO_RX_STS__COUNT_MASK			GENMASK_ULL(8, 0)
+#define MLXBF_TMFIFO_RX_CTL				0x10
+#define MLXBF_TMFIFO_RX_CTL__LENGTH			0x0001
+#define MLXBF_TMFIFO_RX_CTL__LWM_SHIFT			0
+#define MLXBF_TMFIFO_RX_CTL__LWM_WIDTH			8
+#define MLXBF_TMFIFO_RX_CTL__LWM_RESET_VAL		128
+#define MLXBF_TMFIFO_RX_CTL__LWM_RMASK			GENMASK_ULL(7, 0)
+#define MLXBF_TMFIFO_RX_CTL__LWM_MASK			GENMASK_ULL(7, 0)
+#define MLXBF_TMFIFO_RX_CTL__HWM_SHIFT			8
+#define MLXBF_TMFIFO_RX_CTL__HWM_WIDTH			8
+#define MLXBF_TMFIFO_RX_CTL__HWM_RESET_VAL		128
+#define MLXBF_TMFIFO_RX_CTL__HWM_RMASK			GENMASK_ULL(7, 0)
+#define MLXBF_TMFIFO_RX_CTL__HWM_MASK			GENMASK_ULL(15, 8)
+#define MLXBF_TMFIFO_RX_CTL__MAX_ENTRIES_SHIFT		32
+#define MLXBF_TMFIFO_RX_CTL__MAX_ENTRIES_WIDTH		9
+#define MLXBF_TMFIFO_RX_CTL__MAX_ENTRIES_RESET_VAL	256
+#define MLXBF_TMFIFO_RX_CTL__MAX_ENTRIES_RMASK		GENMASK_ULL(8, 0)
+#define MLXBF_TMFIFO_RX_CTL__MAX_ENTRIES_MASK		GENMASK_ULL(40, 32)
+
+#endif /* !defined(__MLXBF_TMFIFO_REGS_H__) */
diff --git a/drivers/platform/mellanox/mlxbf-tmfifo.c b/drivers/platform/mellanox/mlxbf-tmfifo.c
new file mode 100644
index 0000000..0a31ffa
--- /dev/null
+++ b/drivers/platform/mellanox/mlxbf-tmfifo.c
@@ -0,0 +1,1286 @@
+// SPDX-License-Identifier: GPL-2.0+
+/*
+ * Mellanox BlueField SoC TmFifo driver
+ *
+ * Copyright (C) 2019 Mellanox Technologies
+ */
+
+#include <linux/acpi.h>
+#include <linux/bitfield.h>
+#include <linux/circ_buf.h>
+#include <linux/efi.h>
+#include <linux/irq.h>
+#include <linux/module.h>
+#include <linux/mutex.h>
+#include <linux/platform_device.h>
+#include <linux/types.h>
+#include <linux/virtio_config.h>
+#include <linux/virtio_console.h>
+#include <linux/virtio_ids.h>
+#include <linux/virtio_net.h>
+#include <linux/virtio_ring.h>
+
+#include "mlxbf-tmfifo-regs.h"
+
+/* Vring size. */
+#define MLXBF_TMFIFO_VRING_SIZE			SZ_1K
+
+/* Console Tx buffer size. */
+#define MLXBF_TMFIFO_CON_TX_BUF_SIZE		SZ_32K
+
+/* Console Tx buffer reserved space. */
+#define MLXBF_TMFIFO_CON_TX_BUF_RSV_SIZE	8
+
+/* House-keeping timer interval. */
+#define MLXBF_TMFIFO_TIMER_INTERVAL		(HZ / 10)
+
+/* Virtual devices sharing the TM FIFO. */
+#define MLXBF_TMFIFO_VDEV_MAX		(VIRTIO_ID_CONSOLE + 1)
+
+/*
+ * Reserve 1/16 of TmFifo space, so console messages are not starved by
+ * the networking traffic.
+ */
+#define MLXBF_TMFIFO_RESERVE_RATIO		16
+
+/* Message with data needs at least two words (for header & data). */
+#define MLXBF_TMFIFO_DATA_MIN_WORDS		2
+
+struct mlxbf_tmfifo;
+
+/**
+ * mlxbf_tmfifo_vring - Structure of the TmFifo virtual ring
+ * @va: virtual address of the ring
+ * @dma: dma address of the ring
+ * @vq: pointer to the virtio virtqueue
+ * @desc: current descriptor of the pending packet
+ * @desc_head: head descriptor of the pending packet
+ * @cur_len: processed length of the current descriptor
+ * @rem_len: remaining length of the pending packet
+ * @pkt_len: total length of the pending packet
+ * @next_avail: next avail descriptor id
+ * @num: vring size (number of descriptors)
+ * @align: vring alignment size
+ * @index: vring index
+ * @vdev_id: vring virtio id (VIRTIO_ID_xxx)
+ * @fifo: pointer to the tmfifo structure
+ */
+struct mlxbf_tmfifo_vring {
+	void *va;
+	dma_addr_t dma;
+	struct virtqueue *vq;
+	struct vring_desc *desc;
+	struct vring_desc *desc_head;
+	int cur_len;
+	int rem_len;
+	u32 pkt_len;
+	u16 next_avail;
+	int num;
+	int align;
+	int index;
+	int vdev_id;
+	struct mlxbf_tmfifo *fifo;
+};
+
+/* Interrupt types. */
+enum {
+	MLXBF_TM_RX_LWM_IRQ,
+	MLXBF_TM_RX_HWM_IRQ,
+	MLXBF_TM_TX_LWM_IRQ,
+	MLXBF_TM_TX_HWM_IRQ,
+	MLXBF_TM_MAX_IRQ
+};
+
+/* Ring types (Rx & Tx). */
+enum {
+	MLXBF_TMFIFO_VRING_RX,
+	MLXBF_TMFIFO_VRING_TX,
+	MLXBF_TMFIFO_VRING_MAX
+};
+
+/**
+ * mlxbf_tmfifo_vdev - Structure of the TmFifo virtual device
+ * @vdev: virtio device, in which the vdev.id.device field has the
+ *        VIRTIO_ID_xxx id to distinguish the virtual device.
+ * @status: status of the device
+ * @features: supported features of the device
+ * @vrings: array of tmfifo vrings of this device
+ * @config.cons: virtual console config -
+ *               select if vdev.id.device is VIRTIO_ID_CONSOLE
+ * @config.net: virtual network config -
+ *              select if vdev.id.device is VIRTIO_ID_NET
+ * @tx_buf: tx buffer used to buffer data before writing into the FIFO
+ */
+struct mlxbf_tmfifo_vdev {
+	struct virtio_device vdev;
+	u8 status;
+	u64 features;
+	struct mlxbf_tmfifo_vring vrings[MLXBF_TMFIFO_VRING_MAX];
+	union {
+		struct virtio_console_config cons;
+		struct virtio_net_config net;
+	} config;
+	struct circ_buf tx_buf;
+};
+
+/**
+ * mlxbf_tmfifo_irq_info - Structure of the interrupt information
+ * @fifo: pointer to the tmfifo structure
+ * @irq: interrupt number
+ * @index: index into the interrupt array
+ */
+struct mlxbf_tmfifo_irq_info {
+	struct mlxbf_tmfifo *fifo;
+	int irq;
+	int index;
+};
+
+/**
+ * mlxbf_tmfifo - Structure of the TmFifo
+ * @vdev: array of the virtual devices running over the TmFifo
+ * @pdev: platform device
+ * @lock: lock to protect the TmFifo access
+ * @rx_base: mapped register base address for the Rx fifo
+ * @tx_base: mapped register base address for the Tx fifo
+ * @rx_fifo_size: number of entries of the Rx fifo
+ * @tx_fifo_size: number of entries of the Tx fifo
+ * @pend_events: pending bits for deferred events
+ * @irq_info: interrupt information
+ * @work: work struct for deferred process
+ * @timer: background timer
+ * @vring: Tx/Rx ring
+ * @spin_lock: spin lock
+ * @is_ready: ready flag
+ */
+struct mlxbf_tmfifo {
+	struct mlxbf_tmfifo_vdev *vdev[MLXBF_TMFIFO_VDEV_MAX];
+	struct platform_device *pdev;
+	struct mutex lock;		/* TmFifo lock */
+	void __iomem *rx_base;
+	void __iomem *tx_base;
+	int rx_fifo_size;
+	int tx_fifo_size;
+	unsigned long pend_events;
+	struct mlxbf_tmfifo_irq_info irq_info[MLXBF_TM_MAX_IRQ];
+	struct work_struct work;
+	struct timer_list timer;
+	struct mlxbf_tmfifo_vring *vring[2];
+	spinlock_t spin_lock;		/* spin lock */
+	bool is_ready;
+};
+
+/**
+ * mlxbf_tmfifo_msg_hdr - Structure of the TmFifo message header
+ * @type: message type
+ * @len: payload length
+ * @u: 64-bit union data
+ */
+union mlxbf_tmfifo_msg_hdr {
+	struct {
+		u8 type;
+		__be16 len;
+		u8 unused[5];
+	} __packed;
+	u64 data;
+};
+
+/*
+ * Default MAC.
+ * This MAC address will be read from EFI persistent variable if configured.
+ * It can also be reconfigured with standard Linux tools.
+ */
+static u8 mlxbf_tmfifo_net_default_mac[ETH_ALEN] = {
+	0x00, 0x1A, 0xCA, 0xFF, 0xFF, 0x01};
+
+/* EFI variable name of the MAC address. */
+static efi_char16_t mlxbf_tmfifo_efi_name[] = L"RshimMacAddr";
+
+/* Maximum L2 header length. */
+#define MLXBF_TMFIFO_NET_L2_OVERHEAD	36
+
+/* Supported virtio-net features. */
+#define MLXBF_TMFIFO_NET_FEATURES	(BIT_ULL(VIRTIO_NET_F_MTU) | \
+					 BIT_ULL(VIRTIO_NET_F_STATUS) | \
+					 BIT_ULL(VIRTIO_NET_F_MAC))
+
+#define mlxbf_vdev_to_tmfifo(dev)	\
+	container_of(dev, struct mlxbf_tmfifo_vdev, vdev)
+
+/* Allocate vrings for the fifo. */
+static int mlxbf_tmfifo_alloc_vrings(struct mlxbf_tmfifo *fifo,
+				     struct mlxbf_tmfifo_vdev *tm_vdev)
+{
+	struct mlxbf_tmfifo_vring *vring;
+	struct device *dev;
+	dma_addr_t dma;
+	int i, size;
+	void *va;
+
+	for (i = 0; i < ARRAY_SIZE(tm_vdev->vrings); i++) {
+		vring = &tm_vdev->vrings[i];
+		vring->fifo = fifo;
+		vring->num = MLXBF_TMFIFO_VRING_SIZE;
+		vring->align = SMP_CACHE_BYTES;
+		vring->index = i;
+		vring->vdev_id = tm_vdev->vdev.id.device;
+		dev = &tm_vdev->vdev.dev;
+
+		size = vring_size(vring->num, vring->align);
+		va = dma_alloc_coherent(dev->parent, size, &dma, GFP_KERNEL);
+		if (!va) {
+			dev_err(dev->parent, "dma_alloc_coherent failed\n");
+			return -ENOMEM;
+		}
+
+		vring->va = va;
+		vring->dma = dma;
+	}
+
+	return 0;
+}
+
+/* Free vrings of the fifo device. */
+static void mlxbf_tmfifo_free_vrings(struct mlxbf_tmfifo *fifo,
+				     struct mlxbf_tmfifo_vdev *tm_vdev)
+{
+	struct mlxbf_tmfifo_vring *vring;
+	int i, size;
+
+	for (i = 0; i < ARRAY_SIZE(tm_vdev->vrings); i++) {
+		vring = &tm_vdev->vrings[i];
+		if (vring->va) {
+			size = vring_size(vring->num, vring->align);
+			dma_free_coherent(tm_vdev->vdev.dev.parent, size,
+					  vring->va, vring->dma);
+			vring->va = NULL;
+			if (vring->vq) {
+				vring_del_virtqueue(vring->vq);
+				vring->vq = NULL;
+			}
+		}
+	}
+}
+
+/* Disable interrupts of the fifo device. */
+static void mlxbf_tmfifo_disable_irqs(struct mlxbf_tmfifo *fifo)
+{
+	int i, irq;
+
+	for (i = 0; i < MLXBF_TM_MAX_IRQ; i++) {
+		irq = fifo->irq_info[i].irq;
+		fifo->irq_info[i].irq = 0;
+		disable_irq(irq);
+	}
+}
+
+/* Interrupt handler. */
+static irqreturn_t mlxbf_tmfifo_irq_handler(int irq, void *arg)
+{
+	struct mlxbf_tmfifo_irq_info *irq_info = arg;
+
+	if (irq_info->index < MLXBF_TM_MAX_IRQ &&
+	    !test_and_set_bit(irq_info->index, &irq_info->fifo->pend_events))
+		schedule_work(&irq_info->fifo->work);
+
+	return IRQ_HANDLED;
+}
+
+/* Get the next packet descriptor from the vring. */
+static struct vring_desc *
+mlxbf_tmfifo_get_next_desc(struct mlxbf_tmfifo_vring *vring)
+{
+	const struct vring *vr = virtqueue_get_vring(vring->vq);
+	struct virtio_device *vdev = vring->vq->vdev;
+	unsigned int idx, head;
+
+	if (vring->next_avail == virtio16_to_cpu(vdev, vr->avail->idx))
+		return NULL;
+
+	idx = vring->next_avail % vr->num;
+	head = virtio16_to_cpu(vdev, vr->avail->ring[idx]);
+	if (WARN_ON(head >= vr->num))
+		return NULL;
+
+	vring->next_avail++;
+
+	return &vr->desc[head];
+}
+
+/* Release virtio descriptor. */
+static void mlxbf_tmfifo_release_desc(struct mlxbf_tmfifo_vring *vring,
+				      struct vring_desc *desc, u32 len)
+{
+	const struct vring *vr = virtqueue_get_vring(vring->vq);
+	struct virtio_device *vdev = vring->vq->vdev;
+	u16 idx, vr_idx;
+
+	vr_idx = virtio16_to_cpu(vdev, vr->used->idx);
+	idx = vr_idx % vr->num;
+	vr->used->ring[idx].id = cpu_to_virtio32(vdev, desc - vr->desc);
+	vr->used->ring[idx].len = cpu_to_virtio32(vdev, len);
+
+	/*
+	 * Virtio could poll and check the 'idx' to decide whether the desc is
+	 * done or not. Add a memory barrier here to make sure the update above
+	 * completes before updating the idx.
+	 */
+	mb();
+	vr->used->idx = cpu_to_virtio16(vdev, vr_idx + 1);
+}
+
+/* Get the total length of the descriptor chain. */
+static u32 mlxbf_tmfifo_get_pkt_len(struct mlxbf_tmfifo_vring *vring,
+				    struct vring_desc *desc)
+{
+	const struct vring *vr = virtqueue_get_vring(vring->vq);
+	struct virtio_device *vdev = vring->vq->vdev;
+	u32 len = 0, idx;
+
+	while (desc) {
+		len += virtio32_to_cpu(vdev, desc->len);
+		if (!(virtio16_to_cpu(vdev, desc->flags) & VRING_DESC_F_NEXT))
+			break;
+		idx = virtio16_to_cpu(vdev, desc->next);
+		desc = &vr->desc[idx];
+	}
+
+	return len;
+}
+
+static void mlxbf_tmfifo_release_pending_pkt(struct mlxbf_tmfifo_vring *vring)
+{
+	struct vring_desc *desc_head;
+	u32 len = 0;
+
+	if (vring->desc_head) {
+		desc_head = vring->desc_head;
+		len = vring->pkt_len;
+	} else {
+		desc_head = mlxbf_tmfifo_get_next_desc(vring);
+		if (desc_head)
+			len = mlxbf_tmfifo_get_pkt_len(vring, desc_head);
+	}
+
+	if (desc_head)
+		mlxbf_tmfifo_release_desc(vring, desc_head, len);
+
+	vring->pkt_len = 0;
+	vring->desc = NULL;
+	vring->desc_head = NULL;
+}
+
+static void mlxbf_tmfifo_init_net_desc(struct mlxbf_tmfifo_vring *vring,
+				       struct vring_desc *desc, bool is_rx)
+{
+	struct virtio_device *vdev = vring->vq->vdev;
+	struct virtio_net_hdr *net_hdr;
+
+	net_hdr = phys_to_virt(virtio64_to_cpu(vdev, desc->addr));
+	memset(net_hdr, 0, sizeof(*net_hdr));
+}
+
+/* Get and initialize the next packet. */
+static struct vring_desc *
+mlxbf_tmfifo_get_next_pkt(struct mlxbf_tmfifo_vring *vring, bool is_rx)
+{
+	struct vring_desc *desc;
+
+	desc = mlxbf_tmfifo_get_next_desc(vring);
+	if (desc && is_rx && vring->vdev_id == VIRTIO_ID_NET)
+		mlxbf_tmfifo_init_net_desc(vring, desc, is_rx);
+
+	vring->desc_head = desc;
+	vring->desc = desc;
+
+	return desc;
+}
+
+/* House-keeping timer. */
+static void mlxbf_tmfifo_timer(struct timer_list *arg)
+{
+	struct mlxbf_tmfifo *fifo = container_of(arg, struct mlxbf_tmfifo,
+						 timer);
+	int more;
+
+	more = !test_and_set_bit(MLXBF_TM_RX_HWM_IRQ, &fifo->pend_events) ||
+		    !test_and_set_bit(MLXBF_TM_TX_LWM_IRQ, &fifo->pend_events);
+
+	if (more)
+		schedule_work(&fifo->work);
+
+	mod_timer(&fifo->timer, jiffies + MLXBF_TMFIFO_TIMER_INTERVAL);
+}
+
+/* Copy one console packet into the output buffer. */
+static void mlxbf_tmfifo_console_output_one(struct mlxbf_tmfifo_vdev *cons,
+					    struct mlxbf_tmfifo_vring *vring,
+					    struct vring_desc *desc)
+{
+	const struct vring *vr = virtqueue_get_vring(vring->vq);
+	struct virtio_device *vdev = &cons->vdev;
+	u32 len, idx, seg;
+	void *addr;
+
+	while (desc) {
+		addr = phys_to_virt(virtio64_to_cpu(vdev, desc->addr));
+		len = virtio32_to_cpu(vdev, desc->len);
+
+		seg = CIRC_SPACE_TO_END(cons->tx_buf.head, cons->tx_buf.tail,
+					MLXBF_TMFIFO_CON_TX_BUF_SIZE);
+		if (len <= seg) {
+			memcpy(cons->tx_buf.buf + cons->tx_buf.head, addr, len);
+		} else {
+			memcpy(cons->tx_buf.buf + cons->tx_buf.head, addr, seg);
+			addr += seg;
+			memcpy(cons->tx_buf.buf, addr, len - seg);
+		}
+		cons->tx_buf.head = (cons->tx_buf.head + len) %
+			MLXBF_TMFIFO_CON_TX_BUF_SIZE;
+
+		if (!(virtio16_to_cpu(vdev, desc->flags) & VRING_DESC_F_NEXT))
+			break;
+		idx = virtio16_to_cpu(vdev, desc->next);
+		desc = &vr->desc[idx];
+	}
+}
+
+/* Copy console data into the output buffer. */
+static void mlxbf_tmfifo_console_output(struct mlxbf_tmfifo_vdev *cons,
+					struct mlxbf_tmfifo_vring *vring)
+{
+	struct vring_desc *desc;
+	u32 len, avail;
+
+	desc = mlxbf_tmfifo_get_next_desc(vring);
+	while (desc) {
+		/* Release the packet if not enough space. */
+		len = mlxbf_tmfifo_get_pkt_len(vring, desc);
+		avail = CIRC_SPACE(cons->tx_buf.head, cons->tx_buf.tail,
+				   MLXBF_TMFIFO_CON_TX_BUF_SIZE);
+		if (len + MLXBF_TMFIFO_CON_TX_BUF_RSV_SIZE > avail) {
+			mlxbf_tmfifo_release_desc(vring, desc, len);
+			break;
+		}
+
+		mlxbf_tmfifo_console_output_one(cons, vring, desc);
+		mlxbf_tmfifo_release_desc(vring, desc, len);
+		desc = mlxbf_tmfifo_get_next_desc(vring);
+	}
+}
+
+/* Get the number of available words in Rx FIFO for receiving. */
+static int mlxbf_tmfifo_get_rx_avail(struct mlxbf_tmfifo *fifo)
+{
+	u64 sts;
+
+	sts = readq(fifo->rx_base + MLXBF_TMFIFO_RX_STS);
+	return FIELD_GET(MLXBF_TMFIFO_RX_STS__COUNT_MASK, sts);
+}
+
+/* Get the number of available words in the TmFifo for sending. */
+static int mlxbf_tmfifo_get_tx_avail(struct mlxbf_tmfifo *fifo, int vdev_id)
+{
+	int tx_reserve;
+	u32 count;
+	u64 sts;
+
+	/* Reserve some room in FIFO for console messages. */
+	if (vdev_id == VIRTIO_ID_NET)
+		tx_reserve = fifo->tx_fifo_size / MLXBF_TMFIFO_RESERVE_RATIO;
+	else
+		tx_reserve = 1;
+
+	sts = readq(fifo->tx_base + MLXBF_TMFIFO_TX_STS);
+	count = FIELD_GET(MLXBF_TMFIFO_TX_STS__COUNT_MASK, sts);
+	return fifo->tx_fifo_size - tx_reserve - count;
+}
+
+/* Console Tx (move data from the output buffer into the TmFifo). */
+static void mlxbf_tmfifo_console_tx(struct mlxbf_tmfifo *fifo, int avail)
+{
+	union mlxbf_tmfifo_msg_hdr hdr;
+	struct mlxbf_tmfifo_vdev *cons;
+	unsigned long flags;
+	int size, seg;
+	void *addr;
+	u64 data;
+
+	/* Return if not enough space available. */
+	if (avail < MLXBF_TMFIFO_DATA_MIN_WORDS)
+		return;
+
+	cons = fifo->vdev[VIRTIO_ID_CONSOLE];
+	if (!cons || !cons->tx_buf.buf)
+		return;
+
+	/* Return if no data to send. */
+	size = CIRC_CNT(cons->tx_buf.head, cons->tx_buf.tail,
+			MLXBF_TMFIFO_CON_TX_BUF_SIZE);
+	if (size == 0)
+		return;
+
+	/* Adjust the size to available space. */
+	if (size + sizeof(hdr) > avail * sizeof(u64))
+		size = avail * sizeof(u64) - sizeof(hdr);
+
+	/* Write header. */
+	hdr.data = 0;
+	hdr.type = VIRTIO_ID_CONSOLE;
+	hdr.len = htons(size);
+	writeq(hdr.data, fifo->tx_base + MLXBF_TMFIFO_TX_DATA);
+
+	/* Use spin-lock to protect the 'cons->tx_buf'. */
+	spin_lock_irqsave(&fifo->spin_lock, flags);
+
+	while (size > 0) {
+		addr = cons->tx_buf.buf + cons->tx_buf.tail;
+
+		seg = CIRC_CNT_TO_END(cons->tx_buf.head, cons->tx_buf.tail,
+				      MLXBF_TMFIFO_CON_TX_BUF_SIZE);
+		if (seg >= sizeof(u64)) {
+			memcpy(&data, addr, sizeof(u64));
+		} else {
+			memcpy(&data, addr, seg);
+			memcpy((u8 *)&data + seg, cons->tx_buf.buf,
+			       sizeof(u64) - seg);
+		}
+		writeq(data, fifo->tx_base + MLXBF_TMFIFO_TX_DATA);
+
+		if (size >= sizeof(u64)) {
+			cons->tx_buf.tail = (cons->tx_buf.tail + sizeof(u64)) %
+				MLXBF_TMFIFO_CON_TX_BUF_SIZE;
+			size -= sizeof(u64);
+		} else {
+			cons->tx_buf.tail = (cons->tx_buf.tail + size) %
+				MLXBF_TMFIFO_CON_TX_BUF_SIZE;
+			size = 0;
+		}
+	}
+
+	spin_unlock_irqrestore(&fifo->spin_lock, flags);
+}
+
+/* Rx/Tx one word in the descriptor buffer. */
+static void mlxbf_tmfifo_rxtx_word(struct mlxbf_tmfifo_vring *vring,
+				   struct vring_desc *desc,
+				   bool is_rx, int len)
+{
+	struct virtio_device *vdev = vring->vq->vdev;
+	struct mlxbf_tmfifo *fifo = vring->fifo;
+	void *addr;
+	u64 data;
+
+	/* Get the buffer address of this desc. */
+	addr = phys_to_virt(virtio64_to_cpu(vdev, desc->addr));
+
+	/* Read a word from FIFO for Rx. */
+	if (is_rx)
+		data = readq(fifo->rx_base + MLXBF_TMFIFO_RX_DATA);
+
+	if (vring->cur_len + sizeof(u64) <= len) {
+		/* The whole word. */
+		if (is_rx)
+			memcpy(addr + vring->cur_len, &data, sizeof(u64));
+		else
+			memcpy(&data, addr + vring->cur_len, sizeof(u64));
+		vring->cur_len += sizeof(u64);
+	} else {
+		/* Leftover bytes. */
+		if (is_rx)
+			memcpy(addr + vring->cur_len, &data,
+			       len - vring->cur_len);
+		else
+			memcpy(&data, addr + vring->cur_len,
+			       len - vring->cur_len);
+		vring->cur_len = len;
+	}
+
+	/* Write the word into FIFO for Tx. */
+	if (!is_rx)
+		writeq(data, fifo->tx_base + MLXBF_TMFIFO_TX_DATA);
+}
+
+/*
+ * Rx/Tx packet header.
+ *
+ * In Rx case, the packet might be found to belong to a different vring since
+ * the TmFifo is shared by different services. In such case, the 'vring_change'
+ * flag is set.
+ */
+static void mlxbf_tmfifo_rxtx_header(struct mlxbf_tmfifo_vring *vring,
+				     struct vring_desc *desc,
+				     bool is_rx, bool *vring_change)
+{
+	struct mlxbf_tmfifo *fifo = vring->fifo;
+	struct virtio_net_config *config;
+	union mlxbf_tmfifo_msg_hdr hdr;
+	int vdev_id, hdr_len;
+
+	/* Read/Write packet header. */
+	if (is_rx) {
+		/* Drain one word from the FIFO. */
+		hdr.data = readq(fifo->rx_base + MLXBF_TMFIFO_RX_DATA);
+
+		/* Skip the length 0 packets (keepalive). */
+		if (hdr.len == 0)
+			return;
+
+		/* Check packet type. */
+		if (hdr.type == VIRTIO_ID_NET) {
+			vdev_id = VIRTIO_ID_NET;
+			hdr_len = sizeof(struct virtio_net_hdr);
+			config = &fifo->vdev[vdev_id]->config.net;
+			if (ntohs(hdr.len) > config->mtu +
+			    MLXBF_TMFIFO_NET_L2_OVERHEAD)
+				return;
+		} else {
+			vdev_id = VIRTIO_ID_CONSOLE;
+			hdr_len = 0;
+		}
+
+		/*
+		 * Check whether the new packet still belongs to this vring.
+		 * If not, update the pkt_len of the new vring.
+		 */
+		if (vdev_id != vring->vdev_id) {
+			struct mlxbf_tmfifo_vdev *tm_dev2 = fifo->vdev[vdev_id];
+
+			if (!tm_dev2)
+				return;
+			vring->desc = desc;
+			vring = &tm_dev2->vrings[MLXBF_TMFIFO_VRING_RX];
+			*vring_change = true;
+		}
+		vring->pkt_len = ntohs(hdr.len) + hdr_len;
+	} else {
+		/* Network virtio has an extra header. */
+		hdr_len = (vring->vdev_id == VIRTIO_ID_NET) ?
+			   sizeof(struct virtio_net_hdr) : 0;
+		vring->pkt_len = mlxbf_tmfifo_get_pkt_len(vring, desc);
+		hdr.data = 0;
+		hdr.type = (vring->vdev_id == VIRTIO_ID_NET) ?
+			    VIRTIO_ID_NET : VIRTIO_ID_CONSOLE;
+		hdr.len = htons(vring->pkt_len - hdr_len);
+		writeq(hdr.data, fifo->tx_base + MLXBF_TMFIFO_TX_DATA);
+	}
+
+	vring->cur_len = hdr_len;
+	vring->rem_len = vring->pkt_len;
+	fifo->vring[is_rx] = vring;
+}
+
+/*
+ * Rx/Tx one descriptor.
+ *
+ * Return true to indicate more data available.
+ */
+static bool mlxbf_tmfifo_rxtx_one_desc(struct mlxbf_tmfifo_vring *vring,
+				       bool is_rx, int *avail)
+{
+	const struct vring *vr = virtqueue_get_vring(vring->vq);
+	struct mlxbf_tmfifo *fifo = vring->fifo;
+	struct virtio_device *vdev;
+	bool vring_change = false;
+	struct vring_desc *desc;
+	unsigned long flags;
+	u32 len, idx;
+
+	vdev = &fifo->vdev[vring->vdev_id]->vdev;
+
+	/* Get the descriptor of the next packet. */
+	if (!vring->desc) {
+		desc = mlxbf_tmfifo_get_next_pkt(vring, is_rx);
+		if (!desc)
+			return false;
+	} else {
+		desc = vring->desc;
+	}
+
+	/* Beginning of a packet. Start to Rx/Tx packet header. */
+	if (vring->pkt_len == 0) {
+		mlxbf_tmfifo_rxtx_header(vring, desc, is_rx, &vring_change);
+		(*avail)--;
+
+		/* Return if new packet is for another ring. */
+		if (vring_change)
+			return false;
+		goto mlxbf_tmfifo_desc_done;
+	}
+
+	/* Get the length of this desc. */
+	len = virtio32_to_cpu(vdev, desc->len);
+	if (len > vring->rem_len)
+		len = vring->rem_len;
+
+	/* Rx/Tx one word (8 bytes) if not done. */
+	if (vring->cur_len < len) {
+		mlxbf_tmfifo_rxtx_word(vring, desc, is_rx, len);
+		(*avail)--;
+	}
+
+	/* Check again whether it's done. */
+	if (vring->cur_len == len) {
+		vring->cur_len = 0;
+		vring->rem_len -= len;
+
+		/* Get the next desc on the chain. */
+		if (vring->rem_len > 0 &&
+		    (virtio16_to_cpu(vdev, desc->flags) & VRING_DESC_F_NEXT)) {
+			idx = virtio16_to_cpu(vdev, desc->next);
+			desc = &vr->desc[idx];
+			goto mlxbf_tmfifo_desc_done;
+		}
+
+		/* Done and release the pending packet. */
+		mlxbf_tmfifo_release_pending_pkt(vring);
+		desc = NULL;
+		fifo->vring[is_rx] = NULL;
+
+		/* Notify upper layer that packet is done. */
+		spin_lock_irqsave(&fifo->spin_lock, flags);
+		vring_interrupt(0, vring->vq);
+		spin_unlock_irqrestore(&fifo->spin_lock, flags);
+	}
+
+mlxbf_tmfifo_desc_done:
+	/* Save the current desc. */
+	vring->desc = desc;
+
+	return true;
+}
+
+/* Rx & Tx processing of a queue. */
+static void mlxbf_tmfifo_rxtx(struct mlxbf_tmfifo_vring *vring, bool is_rx)
+{
+	int avail = 0, devid = vring->vdev_id;
+	struct mlxbf_tmfifo *fifo;
+	bool more;
+
+	fifo = vring->fifo;
+
+	/* Return if vdev is not ready. */
+	if (!fifo->vdev[devid])
+		return;
+
+	/* Return if another vring is running. */
+	if (fifo->vring[is_rx] && fifo->vring[is_rx] != vring)
+		return;
+
+	/* Only handle console and network for now. */
+	if (WARN_ON(devid != VIRTIO_ID_NET && devid != VIRTIO_ID_CONSOLE))
+		return;
+
+	do {
+		/* Get available FIFO space. */
+		if (avail == 0) {
+			if (is_rx)
+				avail = mlxbf_tmfifo_get_rx_avail(fifo);
+			else
+				avail = mlxbf_tmfifo_get_tx_avail(fifo, devid);
+			if (avail <= 0)
+				break;
+		}
+
+		/* Console output always comes from the Tx buffer. */
+		if (!is_rx && devid == VIRTIO_ID_CONSOLE) {
+			mlxbf_tmfifo_console_tx(fifo, avail);
+			break;
+		}
+
+		/* Handle one descriptor. */
+		more = mlxbf_tmfifo_rxtx_one_desc(vring, is_rx, &avail);
+	} while (more);
+}
+
+/* Handle Rx or Tx queues. */
+static void mlxbf_tmfifo_work_rxtx(struct mlxbf_tmfifo *fifo, int queue_id,
+				   int irq_id, bool is_rx)
+{
+	struct mlxbf_tmfifo_vdev *tm_vdev;
+	struct mlxbf_tmfifo_vring *vring;
+	int i;
+
+	if (!test_and_clear_bit(irq_id, &fifo->pend_events) ||
+	    !fifo->irq_info[irq_id].irq)
+		return;
+
+	for (i = 0; i < MLXBF_TMFIFO_VDEV_MAX; i++) {
+		tm_vdev = fifo->vdev[i];
+		if (tm_vdev) {
+			vring = &tm_vdev->vrings[queue_id];
+			if (vring->vq)
+				mlxbf_tmfifo_rxtx(vring, is_rx);
+		}
+	}
+}
+
+/* Work handler for Rx and Tx case. */
+static void mlxbf_tmfifo_work_handler(struct work_struct *work)
+{
+	struct mlxbf_tmfifo *fifo;
+
+	fifo = container_of(work, struct mlxbf_tmfifo, work);
+	if (!fifo->is_ready)
+		return;
+
+	mutex_lock(&fifo->lock);
+
+	/* Tx (Send data to the TmFifo). */
+	mlxbf_tmfifo_work_rxtx(fifo, MLXBF_TMFIFO_VRING_TX,
+			       MLXBF_TM_TX_LWM_IRQ, false);
+
+	/* Rx (Receive data from the TmFifo). */
+	mlxbf_tmfifo_work_rxtx(fifo, MLXBF_TMFIFO_VRING_RX,
+			       MLXBF_TM_RX_HWM_IRQ, true);
+
+	mutex_unlock(&fifo->lock);
+}
+
+/* The notify function is called when new buffers are posted. */
+static bool mlxbf_tmfifo_virtio_notify(struct virtqueue *vq)
+{
+	struct mlxbf_tmfifo_vring *vring = vq->priv;
+	struct mlxbf_tmfifo_vdev *tm_vdev;
+	struct mlxbf_tmfifo *fifo;
+	unsigned long flags;
+
+	fifo = vring->fifo;
+
+	/*
+	 * Virtio maintains vrings in pairs, even number ring for Rx
+	 * and odd number ring for Tx.
+	 */
+	if (!(vring->index & BIT(0))) {
+		if (test_and_set_bit(MLXBF_TM_RX_HWM_IRQ, &fifo->pend_events))
+			return true;
+	} else {
+		/*
+		 * Console could make blocking call with interrupts disabled.
+		 * In such case, the vring needs to be served right away. For
+		 * other cases, just set the TX LWM bit to start Tx in the
+		 * worker handler.
+		 */
+		if (vring->vdev_id == VIRTIO_ID_CONSOLE) {
+			spin_lock_irqsave(&fifo->spin_lock, flags);
+			tm_vdev = fifo->vdev[VIRTIO_ID_CONSOLE];
+			mlxbf_tmfifo_console_output(tm_vdev, vring);
+			spin_unlock_irqrestore(&fifo->spin_lock, flags);
+		} else if (test_and_set_bit(MLXBF_TM_TX_LWM_IRQ,
+					    &fifo->pend_events)) {
+			return true;
+		}
+	}
+
+	schedule_work(&fifo->work);
+
+	return true;
+}
+
+/* Get the array of feature bits for this device. */
+static u64 mlxbf_tmfifo_virtio_get_features(struct virtio_device *vdev)
+{
+	struct mlxbf_tmfifo_vdev *tm_vdev = mlxbf_vdev_to_tmfifo(vdev);
+
+	return tm_vdev->features;
+}
+
+/* Confirm device features to use. */
+static int mlxbf_tmfifo_virtio_finalize_features(struct virtio_device *vdev)
+{
+	struct mlxbf_tmfifo_vdev *tm_vdev = mlxbf_vdev_to_tmfifo(vdev);
+
+	tm_vdev->features = vdev->features;
+
+	return 0;
+}
+
+/* Free virtqueues found by find_vqs(). */
+static void mlxbf_tmfifo_virtio_del_vqs(struct virtio_device *vdev)
+{
+	struct mlxbf_tmfifo_vdev *tm_vdev = mlxbf_vdev_to_tmfifo(vdev);
+	struct mlxbf_tmfifo_vring *vring;
+	struct virtqueue *vq;
+	int i;
+
+	for (i = 0; i < ARRAY_SIZE(tm_vdev->vrings); i++) {
+		vring = &tm_vdev->vrings[i];
+
+		/* Release the pending packet. */
+		if (vring->desc)
+			mlxbf_tmfifo_release_pending_pkt(vring);
+		vq = vring->vq;
+		if (vq) {
+			vring->vq = NULL;
+			vring_del_virtqueue(vq);
+		}
+	}
+}
+
+/* Create and initialize the virtual queues. */
+static int mlxbf_tmfifo_virtio_find_vqs(struct virtio_device *vdev,
+					unsigned int nvqs,
+					struct virtqueue *vqs[],
+					vq_callback_t *callbacks[],
+					const char * const names[],
+					const bool *ctx,
+					struct irq_affinity *desc)
+{
+	struct mlxbf_tmfifo_vdev *tm_vdev = mlxbf_vdev_to_tmfifo(vdev);
+	struct mlxbf_tmfifo_vring *vring;
+	struct virtqueue *vq;
+	int i, ret, size;
+
+	if (nvqs > ARRAY_SIZE(tm_vdev->vrings))
+		return -EINVAL;
+
+	for (i = 0; i < nvqs; ++i) {
+		if (!names[i]) {
+			ret = -EINVAL;
+			goto error;
+		}
+		vring = &tm_vdev->vrings[i];
+
+		/* zero vring */
+		size = vring_size(vring->num, vring->align);
+		memset(vring->va, 0, size);
+		vq = vring_new_virtqueue(i, vring->num, vring->align, vdev,
+					 false, false, vring->va,
+					 mlxbf_tmfifo_virtio_notify,
+					 callbacks[i], names[i]);
+		if (!vq) {
+			dev_err(&vdev->dev, "vring_new_virtqueue failed\n");
+			ret = -ENOMEM;
+			goto error;
+		}
+
+		vqs[i] = vq;
+		vring->vq = vq;
+		vq->priv = vring;
+	}
+
+	return 0;
+
+error:
+	mlxbf_tmfifo_virtio_del_vqs(vdev);
+	return ret;
+}
+
+/* Read the status byte. */
+static u8 mlxbf_tmfifo_virtio_get_status(struct virtio_device *vdev)
+{
+	struct mlxbf_tmfifo_vdev *tm_vdev = mlxbf_vdev_to_tmfifo(vdev);
+
+	return tm_vdev->status;
+}
+
+/* Write the status byte. */
+static void mlxbf_tmfifo_virtio_set_status(struct virtio_device *vdev,
+					   u8 status)
+{
+	struct mlxbf_tmfifo_vdev *tm_vdev = mlxbf_vdev_to_tmfifo(vdev);
+
+	tm_vdev->status = status;
+}
+
+/* Reset the device. Not much here for now. */
+static void mlxbf_tmfifo_virtio_reset(struct virtio_device *vdev)
+{
+	struct mlxbf_tmfifo_vdev *tm_vdev = mlxbf_vdev_to_tmfifo(vdev);
+
+	tm_vdev->status = 0;
+}
+
+/* Read the value of a configuration field. */
+static void mlxbf_tmfifo_virtio_get(struct virtio_device *vdev,
+				    unsigned int offset,
+				    void *buf,
+				    unsigned int len)
+{
+	struct mlxbf_tmfifo_vdev *tm_vdev = mlxbf_vdev_to_tmfifo(vdev);
+
+	if (offset + len > sizeof(tm_vdev->config))
+		return;
+
+	memcpy(buf, (u8 *)&tm_vdev->config + offset, len);
+}
+
+/* Write the value of a configuration field. */
+static void mlxbf_tmfifo_virtio_set(struct virtio_device *vdev,
+				    unsigned int offset,
+				    const void *buf,
+				    unsigned int len)
+{
+	struct mlxbf_tmfifo_vdev *tm_vdev = mlxbf_vdev_to_tmfifo(vdev);
+
+	if (offset + len > sizeof(tm_vdev->config))
+		return;
+
+	memcpy((u8 *)&tm_vdev->config + offset, buf, len);
+}
+
+/*
+ * Nothing to do for now. This function is needed to avoid warnings
+ * when the device is released in device_release().
+ */
+static void tmfifo_virtio_dev_release(struct device *dev)
+{
+}
+
+/* Virtio config operations. */
+static const struct virtio_config_ops mlxbf_tmfifo_virtio_config_ops = {
+	.get_features = mlxbf_tmfifo_virtio_get_features,
+	.finalize_features = mlxbf_tmfifo_virtio_finalize_features,
+	.find_vqs = mlxbf_tmfifo_virtio_find_vqs,
+	.del_vqs = mlxbf_tmfifo_virtio_del_vqs,
+	.reset = mlxbf_tmfifo_virtio_reset,
+	.set_status = mlxbf_tmfifo_virtio_set_status,
+	.get_status = mlxbf_tmfifo_virtio_get_status,
+	.get = mlxbf_tmfifo_virtio_get,
+	.set = mlxbf_tmfifo_virtio_set,
+};
+
+/* Create vdev type in a tmfifo. */
+static int mlxbf_tmfifo_create_vdev(struct device *dev,
+				    struct mlxbf_tmfifo *fifo,
+				    int vdev_id, u64 features,
+				    void *config, u32 size)
+{
+	struct mlxbf_tmfifo_vdev *tm_vdev;
+	int ret;
+
+	mutex_lock(&fifo->lock);
+
+	tm_vdev = fifo->vdev[vdev_id];
+	if (tm_vdev) {
+		dev_err(dev, "vdev %d already exists\n", vdev_id);
+		ret = -EEXIST;
+		goto fail;
+	}
+
+	tm_vdev = devm_kzalloc(dev, sizeof(*tm_vdev), GFP_KERNEL);
+	if (!tm_vdev) {
+		ret = -ENOMEM;
+		goto fail;
+	}
+
+	tm_vdev->vdev.id.device = vdev_id;
+	tm_vdev->vdev.config = &mlxbf_tmfifo_virtio_config_ops;
+	tm_vdev->vdev.dev.parent = &fifo->pdev->dev;
+	tm_vdev->vdev.dev.release = tmfifo_virtio_dev_release;
+	tm_vdev->features = features;
+	if (config)
+		memcpy(&tm_vdev->config, config, size);
+
+	if (mlxbf_tmfifo_alloc_vrings(fifo, tm_vdev)) {
+		dev_err(dev, "unable to allocate vring\n");
+		ret = -ENOMEM;
+		goto vdev_fail;
+	}
+
+	/* Allocate an output buffer for the console device. */
+	if (vdev_id == VIRTIO_ID_CONSOLE)
+		tm_vdev->tx_buf.buf = devm_kmalloc(dev,
+						   MLXBF_TMFIFO_CON_TX_BUF_SIZE,
+						   GFP_KERNEL);
+	fifo->vdev[vdev_id] = tm_vdev;
+
+	/* Register the virtio device. */
+	ret = register_virtio_device(&tm_vdev->vdev);
+	if (ret) {
+		dev_err(&fifo->pdev->dev, "register_virtio_device failed\n");
+		goto vdev_fail;
+	}
+
+	mutex_unlock(&fifo->lock);
+	return 0;
+
+vdev_fail:
+	mlxbf_tmfifo_free_vrings(fifo, tm_vdev);
+	fifo->vdev[vdev_id] = NULL;
+fail:
+	mutex_unlock(&fifo->lock);
+	return ret;
+}
+
+/* Delete vdev type from a tmfifo. */
+static int mlxbf_tmfifo_delete_vdev(struct mlxbf_tmfifo *fifo, int vdev_id)
+{
+	struct mlxbf_tmfifo_vdev *tm_vdev;
+
+	mutex_lock(&fifo->lock);
+
+	/* Unregister vdev. */
+	tm_vdev = fifo->vdev[vdev_id];
+	if (tm_vdev) {
+		unregister_virtio_device(&tm_vdev->vdev);
+		mlxbf_tmfifo_free_vrings(fifo, tm_vdev);
+		fifo->vdev[vdev_id] = NULL;
+	}
+
+	mutex_unlock(&fifo->lock);
+
+	return 0;
+}
+
+/* Read the configured network MAC address from efi variable. */
+static void mlxbf_tmfifo_get_cfg_mac(u8 *mac)
+{
+	efi_guid_t guid = EFI_GLOBAL_VARIABLE_GUID;
+	unsigned long size = ETH_ALEN;
+	efi_status_t status;
+	u8 buf[ETH_ALEN];
+
+	status = efi.get_variable(mlxbf_tmfifo_efi_name, &guid, NULL, &size,
+				  buf);
+	if (status == EFI_SUCCESS && size == ETH_ALEN)
+		ether_addr_copy(mac, buf);
+	else
+		memcpy(mac, mlxbf_tmfifo_net_default_mac, ETH_ALEN);
+}
+
+/* Set TmFifo thresolds which is used to trigger interrupts. */
+static void mlxbf_tmfifo_set_threshold(struct mlxbf_tmfifo *fifo)
+{
+	u64 ctl;
+
+	/* Get Tx FIFO size and set the low/high watermark. */
+	ctl = readq(fifo->tx_base + MLXBF_TMFIFO_TX_CTL);
+	fifo->tx_fifo_size =
+		FIELD_GET(MLXBF_TMFIFO_TX_CTL__MAX_ENTRIES_MASK, ctl);
+	ctl = (ctl & ~MLXBF_TMFIFO_TX_CTL__LWM_MASK) |
+		FIELD_PREP(MLXBF_TMFIFO_TX_CTL__LWM_MASK,
+			   fifo->tx_fifo_size / 2);
+	ctl = (ctl & ~MLXBF_TMFIFO_TX_CTL__HWM_MASK) |
+		FIELD_PREP(MLXBF_TMFIFO_TX_CTL__HWM_MASK,
+			   fifo->tx_fifo_size - 1);
+	writeq(ctl, fifo->tx_base + MLXBF_TMFIFO_TX_CTL);
+
+	/* Get Rx FIFO size and set the low/high watermark. */
+	ctl = readq(fifo->rx_base + MLXBF_TMFIFO_RX_CTL);
+	fifo->rx_fifo_size =
+		FIELD_GET(MLXBF_TMFIFO_RX_CTL__MAX_ENTRIES_MASK, ctl);
+	ctl = (ctl & ~MLXBF_TMFIFO_RX_CTL__LWM_MASK) |
+		FIELD_PREP(MLXBF_TMFIFO_RX_CTL__LWM_MASK, 0);
+	ctl = (ctl & ~MLXBF_TMFIFO_RX_CTL__HWM_MASK) |
+		FIELD_PREP(MLXBF_TMFIFO_RX_CTL__HWM_MASK, 1);
+	writeq(ctl, fifo->rx_base + MLXBF_TMFIFO_RX_CTL);
+}
+
+static void mlxbf_tmfifo_cleanup(struct mlxbf_tmfifo *fifo)
+{
+	int i;
+
+	fifo->is_ready = false;
+	del_timer_sync(&fifo->timer);
+	mlxbf_tmfifo_disable_irqs(fifo);
+	cancel_work_sync(&fifo->work);
+	for (i = 0; i < MLXBF_TMFIFO_VDEV_MAX; i++)
+		mlxbf_tmfifo_delete_vdev(fifo, i);
+}
+
+/* Probe the TMFIFO. */
+static int mlxbf_tmfifo_probe(struct platform_device *pdev)
+{
+	struct virtio_net_config net_config;
+	struct mlxbf_tmfifo *fifo;
+	struct resource *res;
+	int i, ret;
+
+	fifo = devm_kzalloc(&pdev->dev, sizeof(*fifo), GFP_KERNEL);
+	if (!fifo)
+		return -ENOMEM;
+
+	spin_lock_init(&fifo->spin_lock);
+	INIT_WORK(&fifo->work, mlxbf_tmfifo_work_handler);
+	mutex_init(&fifo->lock);
+
+	/* Get the resource of the Rx FIFO. */
+	res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
+	fifo->rx_base = devm_ioremap_resource(&pdev->dev, res);
+	if (IS_ERR(fifo->rx_base))
+		return PTR_ERR(fifo->rx_base);
+
+	/* Get the resource of the Tx FIFO. */
+	res = platform_get_resource(pdev, IORESOURCE_MEM, 1);
+	fifo->tx_base = devm_ioremap_resource(&pdev->dev, res);
+	if (IS_ERR(fifo->tx_base))
+		return PTR_ERR(fifo->tx_base);
+
+	fifo->pdev = pdev;
+	platform_set_drvdata(pdev, fifo);
+
+	timer_setup(&fifo->timer, mlxbf_tmfifo_timer, 0);
+
+	for (i = 0; i < MLXBF_TM_MAX_IRQ; i++) {
+		fifo->irq_info[i].index = i;
+		fifo->irq_info[i].fifo = fifo;
+		fifo->irq_info[i].irq = platform_get_irq(pdev, i);
+		ret = devm_request_irq(&pdev->dev, fifo->irq_info[i].irq,
+				       mlxbf_tmfifo_irq_handler, 0,
+				       "tmfifo", &fifo->irq_info[i]);
+		if (ret) {
+			dev_err(&pdev->dev, "devm_request_irq failed\n");
+			fifo->irq_info[i].irq = 0;
+			return ret;
+		}
+	}
+
+	mlxbf_tmfifo_set_threshold(fifo);
+
+	/* Create the console vdev. */
+	ret = mlxbf_tmfifo_create_vdev(&pdev->dev, fifo, VIRTIO_ID_CONSOLE, 0,
+				       NULL, 0);
+	if (ret)
+		goto fail;
+
+	/* Create the network vdev. */
+	memset(&net_config, 0, sizeof(net_config));
+	net_config.mtu = ETH_DATA_LEN;
+	net_config.status = VIRTIO_NET_S_LINK_UP;
+	mlxbf_tmfifo_get_cfg_mac(net_config.mac);
+	ret = mlxbf_tmfifo_create_vdev(&pdev->dev, fifo, VIRTIO_ID_NET,
+				       MLXBF_TMFIFO_NET_FEATURES, &net_config,
+				       sizeof(net_config));
+	if (ret)
+		goto fail;
+
+	mod_timer(&fifo->timer, jiffies + MLXBF_TMFIFO_TIMER_INTERVAL);
+
+	fifo->is_ready = true;
+	return 0;
+
+fail:
+	mlxbf_tmfifo_cleanup(fifo);
+	return ret;
+}
+
+/* Device remove function. */
+static int mlxbf_tmfifo_remove(struct platform_device *pdev)
+{
+	struct mlxbf_tmfifo *fifo = platform_get_drvdata(pdev);
+
+	mlxbf_tmfifo_cleanup(fifo);
+
+	return 0;
+}
+
+static const struct acpi_device_id mlxbf_tmfifo_acpi_match[] = {
+	{ "MLNXBF01", 0 },
+	{}
+};
+MODULE_DEVICE_TABLE(acpi, mlxbf_tmfifo_acpi_match);
+
+static struct platform_driver mlxbf_tmfifo_driver = {
+	.probe = mlxbf_tmfifo_probe,
+	.remove = mlxbf_tmfifo_remove,
+	.driver = {
+		.name = "bf-tmfifo",
+		.acpi_match_table = mlxbf_tmfifo_acpi_match,
+	},
+};
+
+module_platform_driver(mlxbf_tmfifo_driver);
+
+MODULE_DESCRIPTION("Mellanox BlueField SoC TmFifo Driver");
+MODULE_LICENSE("GPL v2");
+MODULE_AUTHOR("Mellanox Technologies");
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* RE: [PATCH v10] platform/mellanox: Add TmFifo driver for Mellanox BlueField Soc
  2019-03-06 20:00     ` Liming Sun
@ 2019-03-08 14:44       ` Liming Sun
  0 siblings, 0 replies; 30+ messages in thread
From: Liming Sun @ 2019-03-08 14:44 UTC (permalink / raw)
  To: Andy Shevchenko
  Cc: David Woods, Andy Shevchenko, Darren Hart, Vadim Pasternak,
	Linux Kernel Mailing List, Platform Driver

Andy,

The v11 has been posted.

Thanks!
Liming

> -----Original Message-----
> From: Liming Sun
> Sent: Wednesday, March 6, 2019 3:01 PM
> To: 'Andy Shevchenko' <andy.shevchenko@gmail.com>
> Cc: David Woods <dwoods@mellanox.com>; Andy Shevchenko <andy@infradead.org>; Darren Hart <dvhart@infradead.org>; Vadim
> Pasternak <vadimp@mellanox.com>; Linux Kernel Mailing List <linux-kernel@vger.kernel.org>; Platform Driver <platform-driver-
> x86@vger.kernel.org>
> Subject: RE: [PATCH v10] platform/mellanox: Add TmFifo driver for Mellanox BlueField Soc
> 
> Thanks Andy! Please see my response below. If no further comments, I'll try to post v11 after more testing.
> 
> Regards,
> Liming
> 
> > -----Original Message-----
> > From: Andy Shevchenko <andy.shevchenko@gmail.com>
> > Sent: Tuesday, March 5, 2019 10:34 AM
> > To: Liming Sun <lsun@mellanox.com>
> > Cc: David Woods <dwoods@mellanox.com>; Andy Shevchenko <andy@infradead.org>; Darren Hart <dvhart@infradead.org>; Vadim
> > Pasternak <vadimp@mellanox.com>; Linux Kernel Mailing List <linux-kernel@vger.kernel.org>; Platform Driver <platform-driver-
> > x86@vger.kernel.org>
> > Subject: Re: [PATCH v10] platform/mellanox: Add TmFifo driver for Mellanox BlueField Soc
> >
> > On Thu, Feb 28, 2019 at 5:51 PM Liming Sun <lsun@mellanox.com> wrote:
> > >
> > > This commit adds the TmFifo platform driver for Mellanox BlueField
> > > Soc. TmFifo is a shared FIFO which enables external host machine
> > > to exchange data with the SoC via USB or PCIe. The driver is based
> > > on virtio framework and has console and network access enabled.
> >
> > Thank you for an update.
> >
> > Unfortunately more work is needed. My comments below.
> >
> > > +#define MLXBF_TMFIFO_TX_STS__COUNT_RMASK               GENMASK(8, 0)
> > > +#define MLXBF_TMFIFO_TX_STS__COUNT_MASK                        GENMASK(8, 0)
> >
> > > +#define MLXBF_TMFIFO_TX_CTL__LWM_RMASK                 GENMASK(7, 0)
> > > +#define MLXBF_TMFIFO_TX_CTL__LWM_MASK                  GENMASK(7, 0)
> >
> > > +#define MLXBF_TMFIFO_TX_CTL__HWM_RMASK                 GENMASK(7, 0)
> > > +#define MLXBF_TMFIFO_TX_CTL__HWM_MASK                  GENMASK(15, 8)
> >
> > > +#define MLXBF_TMFIFO_TX_CTL__MAX_ENTRIES_RMASK         GENMASK(8, 0)
> > > +#define MLXBF_TMFIFO_TX_CTL__MAX_ENTRIES_MASK          GENMASK_ULL(40, 32)
> >
> > > +#define MLXBF_TMFIFO_RX_STS__COUNT_RMASK               GENMASK(8, 0)
> > > +#define MLXBF_TMFIFO_RX_STS__COUNT_MASK                        GENMASK(8, 0)
> >
> > > +#define MLXBF_TMFIFO_RX_CTL__LWM_RMASK                 GENMASK(7, 0)
> > > +#define MLXBF_TMFIFO_RX_CTL__LWM_MASK                  GENMASK(7, 0)
> >
> > > +#define MLXBF_TMFIFO_RX_CTL__HWM_RMASK                 GENMASK(7, 0)
> > > +#define MLXBF_TMFIFO_RX_CTL__HWM_MASK                  GENMASK(15, 8)
> >
> > > +#define MLXBF_TMFIFO_RX_CTL__MAX_ENTRIES_RMASK         GENMASK(8, 0)
> > > +#define MLXBF_TMFIFO_RX_CTL__MAX_ENTRIES_MASK          GENMASK_ULL(40, 32)
> >
> > Since two of them have _ULL suffix I'm wondering if you have checked
> > for side effects on the rest, i.e. if you operate with 64-bit variable
> > and use something like ~MLXBF_TMFIFO_RX_CTL__MAX_ENTRIES_RMASK, it may
> > give you interesting results.
> 
> The running system on the SoC is arm64 where BITS_PER_LONG and
> BITS_PER_LONG_LONG have the same value. In such case, the two macros appears
> to be the same. But you're right, I should use GENMASK_ULL() to be consistent
> and more correctly, just in case the "CONFIG_64BIT" is not defined somehow.
> 
> Will update it in v11.
> 
> >
> > > +#define MLXBF_TMFIFO_TIMER_INTERVAL            (HZ / 10)
> >
> > > +/**
> > > + * mlxbf_tmfifo_u64 - Union of 64-bit data
> > > + * @data - 64-bit data in host byte order
> > > + * @data_le - 64-bit data in little-endian byte order
> > > + *
> > > + * It's expected to send 64-bit little-endian value (__le64) into the TmFifo.
> > > + * readq() and writeq() expect u64 instead. A union structure is used here
> > > + * to workaround the explicit casting usage like writeq(*(u64 *)&data_le).
> > > + */
> >
> > How do you know what readq()/writeq() does with the data? Is it on all
> > architectures?
> > How the endianess conversion affects the actual data?
> 
> The SoC runs arm64 and supports little endian for now. The FIFO has two sides,
> one side is the SoC, the other side is extern host machine which could
> access the FIFO via USB or PCIe. The rule is that the 'byte stream' will
> keep the same when one side write 8 bytes and the other side reads
> the 8 bytes. So as long as both sides have agreement on the byte-order
> it should be fine.
> 
> After double check the arm64 readq()/writeq() implementation, it appears
> that these APIs already does cpu_to_le64() and le64_to_cpu()
> conversion. There's actually no need to make another conversion
> (and shouldn't do it). I'll remove these conversions in v11. The code will
> look much cleaner.
> 
> >
> > > +union mlxbf_tmfifo_u64 {
> > > +       u64 data;
> > > +       __le64 data_le;
> > > +};
> >
> > > +/*
> > > + * Default MAC.
> > > + * This MAC address will be read from EFI persistent variable if configured.
> > > + * It can also be reconfigured with standard Linux tools.
> > > + */
> > > +static u8 mlxbf_tmfifo_net_default_mac[6] = {
> > > +       0x00, 0x1A, 0xCA, 0xFF, 0xFF, 0x01};
> >
> > > +#define mlxbf_vdev_to_tmfifo(dev)      \
> > > +       container_of(dev, struct mlxbf_tmfifo_vdev, vdev)
> >
> > One line?
> 
> Couldn't fit it into one line within 80 characters.
> (Please correct me if you meant single line even exceeding 80 chracters).
> 
> >
> > > +/* Return the consumed Tx buffer space. */
> > > +static int mlxbf_tmfifo_vdev_tx_buf_len(struct mlxbf_tmfifo_vdev *tm_vdev)
> > > +{
> > > +       int len;
> > > +
> > > +       if (tm_vdev->tx_tail >= tm_vdev->tx_head)
> > > +               len = tm_vdev->tx_tail - tm_vdev->tx_head;
> > > +       else
> > > +               len = MLXBF_TMFIFO_CONS_TX_BUF_SIZE - tm_vdev->tx_head +
> > > +                       tm_vdev->tx_tail;
> > > +       return len;
> > > +}
> >
> > Is this custom implementation of some kind of circ_buf?
> 
> Yes. I'll try if I could re-use the circ_buf structure and update it in v11.
> 
> >
> > > +/* Allocate vrings for the fifo. */
> > > +static int mlxbf_tmfifo_alloc_vrings(struct mlxbf_tmfifo *fifo,
> > > +                                    struct mlxbf_tmfifo_vdev *tm_vdev)
> > > +{
> > > +       struct mlxbf_tmfifo_vring *vring;
> > > +       struct device *dev;
> > > +       dma_addr_t dma;
> > > +       int i, size;
> > > +       void *va;
> > > +
> > > +       for (i = 0; i < ARRAY_SIZE(tm_vdev->vrings); i++) {
> > > +               vring = &tm_vdev->vrings[i];
> > > +               vring->fifo = fifo;
> > > +               vring->num = MLXBF_TMFIFO_VRING_SIZE;
> > > +               vring->align = SMP_CACHE_BYTES;
> > > +               vring->index = i;
> > > +               vring->vdev_id = tm_vdev->vdev.id.device;
> > > +               dev = &tm_vdev->vdev.dev;
> > > +
> > > +               size = vring_size(vring->num, vring->align);
> > > +               va = dma_alloc_coherent(dev->parent, size, &dma, GFP_KERNEL);
> > > +               if (!va) {
> >
> > > +                       dev_err(dev->parent, "dma_alloc_coherent failed\n");
> >
> > And how do you clean previously allocated items?
> 
> Fixed. Check the return value of mlxbf_tmfifo_alloc_vrings() and goto
> 'register_fail' (probably change to a better name) instead of 'fail'.
> In such case the mlxbf_tmfifo_free_vrings() will be called to clean up
> all allocated vrings.
> 
> >
> > > +                       return -ENOMEM;
> > > +               }
> > > +
> > > +               vring->va = va;
> > > +               vring->dma = dma;
> > > +       }
> > > +
> > > +       return 0;
> > > +}
> >
> > > +/* Disable interrupts of the fifo device. */
> > > +static void mlxbf_tmfifo_disable_irqs(struct mlxbf_tmfifo *fifo)
> > > +{
> > > +       int i, irq;
> > > +
> > > +       for (i = 0; i < MLXBF_TM_MAX_IRQ; i++) {
> > > +               irq = fifo->irq_info[i].irq;
> >
> > > +               if (irq) {
> >
> > I don't think this check is needed if you can guarantee that it has no
> > staled records.
> 
> Yes, it's not needed any more according to the current code.
> Will remove it in v11.
> 
> >
> > > +                       fifo->irq_info[i].irq = 0;
> > > +                       disable_irq(irq);
> > > +               }
> > > +       }
> > > +}
> >
> > > +/* Get the number of available words in the TmFifo for sending. */
> > > +static int mlxbf_tmfifo_get_tx_avail(struct mlxbf_tmfifo *fifo, int vdev_id)
> > > +{
> > > +       int tx_reserve;
> > > +       u64 sts;
> > > +
> > > +       /* Reserve some room in FIFO for console messages. */
> > > +       if (vdev_id == VIRTIO_ID_NET)
> > > +               tx_reserve = fifo->tx_fifo_size / MLXBF_TMFIFO_RESERVE_RATIO;
> > > +       else
> > > +               tx_reserve = 1;
> > > +
> > > +       sts = readq(fifo->tx_base + MLXBF_TMFIFO_TX_STS);
> >
> > > +       return (fifo->tx_fifo_size - tx_reserve -
> > > +               FIELD_GET(MLXBF_TMFIFO_TX_STS__COUNT_MASK, sts));
> >
> > Redundant parens.
> > Moreover, consider
> >
> > u32 count; // or whatever suits for FIELD_GET().
> > ...
> >
> > sts = readq(...);
> > count = FIELD_GET(...);
> > return ...;
> 
> Will update in v11.
> 
> >
> > > +}
> >
> > > +       while (size > 0) {
> > > +               addr = cons->tx_buf + cons->tx_head;
> > > +
> > > +               if (cons->tx_head + sizeof(u64) <=
> > > +                   MLXBF_TMFIFO_CONS_TX_BUF_SIZE) {
> > > +                       memcpy(&data, addr, sizeof(u64));
> > > +               } else {
> > > +                       partial = MLXBF_TMFIFO_CONS_TX_BUF_SIZE - cons->tx_head;
> > > +                       memcpy(&data, addr, partial);
> >
> > > +                       memcpy((u8 *)&data + partial, cons->tx_buf,
> > > +                              sizeof(u64) - partial);
> >
> > Unaligned access?!
> 
> The code here is to build and copy 8 bytes from the buffer into the 'data'
> variable. The source could be unaligned. For example, 3 bytes are at the
> end of the buffer and 5 bytes are at the beginning of the buffer. memcpy()
> is used to do byte-stream copy which seems ok. Please correct me if
> I misunderstand the comment.
> 
> >
> > > +               }
> >
> > > +               buf.data = readq(fifo->rx_base + MLXBF_TMFIFO_RX_DATA);
> > > +               buf.data = le64_to_cpu(buf.data_le);
> >
> > Are you sure this is correct?
> > How did you test this on BE architectures?
> 
> Thanks for the comment! Same as above, the conversion is not really needed.
> I'll remove them in v11. As for testing, we only have arm64 little-endian Linux
> running on the SoC. This conversion doesn't make much difference for the SoC.
> As for BE architecture, we mainly verify the other side of the FIFO, which is the
> external host like using ppc64.
> 
> >
> > > +       tm_vdev = devm_kzalloc(dev, sizeof(*tm_vdev), GFP_KERNEL);
> >
> > Is it appropriate use of devm_* ?
> 
> This is SoC, the device won't be closed or detached. The only case is when
> the driver is unloaded. So it appears ok to use devm_kzalloc() since it's
> allocated during probe() and released during module unload . Please
> correct me if I misunderstand it.
> 
> >
> > > +       if (!tm_vdev) {
> > > +               ret = -ENOMEM;
> > > +               goto fail;
> > > +       }
> >
> > > +/* Read the configured network MAC address from efi variable. */
> > > +static void mlxbf_tmfifo_get_cfg_mac(u8 *mac)
> > > +{
> > > +       efi_guid_t guid = EFI_GLOBAL_VARIABLE_GUID;
> > > +       efi_status_t status;
> > > +       unsigned long size;
> >
> > > +       u8 buf[6];
> >
> > ETH_ALEN ?
> 
> Will update it in v11
> 
> >
> > > +
> > > +       size = sizeof(buf);
> >
> > Ditto.
> 
> Will update it in v11
> 
> >
> > > +       status = efi.get_variable(mlxbf_tmfifo_efi_name, &guid, NULL, &size,
> > > +                                 buf);
> >
> > > +       if (status == EFI_SUCCESS && size == sizeof(buf))
> >
> > Ditto.
> 
> Will update it in v11
> 
> >
> > > +               memcpy(mac, buf, sizeof(buf));
> >
> > ether_addr_copy().
> 
> Will update it in v11
> 
> >
> > > +}
> >
> > > +       memcpy(net_config.mac, mlxbf_tmfifo_net_default_mac, 6);
> >
> > ether_addr_copy()...
> >
> > > +       mlxbf_tmfifo_get_cfg_mac(net_config.mac);
> >
> > ... but actually above should be part of this function.
> 
> Will update it in v11
> 
> >
> > --
> > With Best Regards,
> > Andy Shevchenko

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH v11] platform/mellanox: Add TmFifo driver for Mellanox BlueField Soc
       [not found] <b143b40446c1870fb8d422b364ead95d54552be9.1527264077.git.lsun@mellanox.com>
                   ` (5 preceding siblings ...)
  2019-03-08 14:41 ` [PATCH v11] " Liming Sun
@ 2019-03-26 21:13 ` Liming Sun
  2019-03-28 19:56 ` [PATCH v12] " Liming Sun
                   ` (4 subsequent siblings)
  11 siblings, 0 replies; 30+ messages in thread
From: Liming Sun @ 2019-03-26 21:13 UTC (permalink / raw)
  To: David Woods, Andy Shevchenko, Darren Hart, Vadim Pasternak
  Cc: Liming Sun, linux-kernel, platform-driver-x86

This commit adds the TmFifo platform driver for Mellanox BlueField
Soc. TmFifo is a shared FIFO which enables external host machine
to exchange data with the SoC via USB or PCIe. The driver is based
on virtio framework and has console and network access enabled.

Reviewed-by: Vadim Pasternak <vadimp@mellanox.com>
Signed-off-by: Liming Sun <lsun@mellanox.com>
---
v11->v11: rebase & resend, no new changes
v10->v11:
    Fixes for comments from Andy:
    - Use GENMASK_ULL() instead of GENMASK() in mlxbf-tmfifo-regs.h
    - Removed the cpu_to_le64()/le64_to_cpu() conversion since
      readq()/writeq() already takes care of it.
    - Remove the "if (irq)" check in mlxbf_tmfifo_disable_irqs().
    - Add "u32 count" temp variable in mlxbf_tmfifo_get_tx_avail().
    - Clean up mlxbf_tmfifo_get_cfg_mac(), use ETH_ALEN instead of
      value 6.
    - Change the tx_buf to use Linux existing 'struct circ_buf'.
    Comment not applied:
    - "Change macro mlxbf_vdev_to_tmfifo() to one line"
      Couldn't fit in one line with 80 chracters
    - "Is it appropriate use of devm_* for 'tm_vdev = devm_kzalloc'"
      This is SoC, the device won't be closed or detached.
      The only case is when the driver is unloaded. So it appears
      ok to use devm_kzalloc() since it's allocated during probe()
      and released during module unload.
    Comments from Vadim: OK
v9->v10:
    Fixes for comments from Andy:
    - Use devm_ioremap_resource() instead of devm_ioremap().
    - Use kernel-doc comments.
    - Keep Makefile contents sorted.
    - Use same fixed format for offsets.
    - Use SZ_1K/SZ_32K instead of 1024/23*1024.
    - Remove unnecessary comments.
    - Use one style for max numbers.
    - More comments for mlxbf_tmfifo_vdev and mlxbf_tmfifo_data_64bit.
    - Use globally defined MTU instead of new definition.
    - Remove forward declaration of mlxbf_tmfifo_remove().
    - Remove PAGE_ALIGN() for dma_alloc_coherent)().
    - Remove the cast of "struct vring *".
    - Check return result of test_and_set_bit().
    - Add a macro mlxbt_vdev_to_tmfifo().
    - Several other minor coding style comments.
    Comment not applied:
    - "Shouldn't be rather helper in EFI lib in kernel"
      Looks like efi.get_variable() is the way I found in the kernel
      tree.
    - "this one is not protected anyhow? Potential race condition"
      In mlxbf_tmfifo_console_tx(), the spin-lock is used to protect the
      'tx_buf' only, not the FIFO writes. So there is no race condition.
    - "Is __packed needed in mlxbf_tmfifo_msg_hdr".
      Yes, it is needed to make sure the structure is 8 bytes.
    Fixes for comments from Vadim:
    - Use tab in mlxbf-tmfifo-regs.h
    - Use kernel-doc comments for struct mlxbf_tmfifo_msg_hdr and
      mlxbf_tmfifo_irq_info as well.
    - Use _MAX instead of _CNT in the macro definition to be consistent.
    - Fix the MODULE_LICENSE.
    - Use BIT_ULL() instead of BIT().
    - Remove argument of 'avail' for mlxbf_tmfifo_rxtx_header() and
      mlxbf_tmfifo_rxtx_word()
    - Revise logic in mlxbf_tmfifo_rxtx_one_desc() to remove the
      WARN_ON().
    - Change "union mlxbf_tmfifo_u64 u" to "union mlxbf_tmfifo_u64 buf"
      in mlxbf_tmfifo_rxtx_word().
    - Change date type of vring_change from 'int' to 'bool'.
    - Remove the blank lines after Signed-off.
    - Don’t use declaration in the middle.
    - Make the network header initialization in some more elegant way.
    - Change label done to mlxbf_tmfifo_desc_done.
    - Remove some unnecessary comments, and several other misc coding
      style comments.
    - Simplify code logic in mlxbf_tmfifo_virtio_notify()
    New changes by Liming:
    - Simplify the Rx/Tx function arguments to make it more readable.
v8->v9:
    Fixes for comments from Andy:
    - Use modern devm_xxx() API instead.
    Fixes for comments from Vadim:
    - Split the Rx/Tx function into smaller funcitons.
    - File name, copyright information.
    - Function and variable name conversion.
    - Local variable and indent coding styles.
    - Remove unnecessary 'inline' declarations.
    - Use devm_xxx() APIs.
    - Move the efi_char16_t MAC address definition to global.
    - Fix warnings reported by 'checkpatch --strict'.
    - Fix warnings reported by 'make CF="-D__CHECK_ENDIAN__"'.
    - Change select VIRTIO_xxx to depends on  VIRTIO_ in Kconfig.
    - Merge mlxbf_tmfifo_vdev_tx_buf_push() and
      mlxbf_tmfifo_vdev_tx_buf_pop().
    - Add union to avoid casting between __le64 and u64.
    - Several other misc coding style comments.
    New changes by Liming:
    - Removed the DT binding documentation since only ACPI is
      supported for now by UEFI on the SoC.
v8: Re-submit under drivers/platform/mellanox for the target-side
    platform driver only.
v7: Added host side drivers into the same patch set.
v5~v6: Coding style fix.
v1~v4: Initial version for directory drivers/soc/mellanox.
---
 drivers/platform/mellanox/Kconfig             |   12 +-
 drivers/platform/mellanox/Makefile            |    1 +
 drivers/platform/mellanox/mlxbf-tmfifo-regs.h |   63 ++
 drivers/platform/mellanox/mlxbf-tmfifo.c      | 1286 +++++++++++++++++++++++++
 4 files changed, 1361 insertions(+), 1 deletion(-)
 create mode 100644 drivers/platform/mellanox/mlxbf-tmfifo-regs.h
 create mode 100644 drivers/platform/mellanox/mlxbf-tmfifo.c

diff --git a/drivers/platform/mellanox/Kconfig b/drivers/platform/mellanox/Kconfig
index cd8a908..530fe7e 100644
--- a/drivers/platform/mellanox/Kconfig
+++ b/drivers/platform/mellanox/Kconfig
@@ -5,7 +5,7 @@
 
 menuconfig MELLANOX_PLATFORM
 	bool "Platform support for Mellanox hardware"
-	depends on X86 || ARM || COMPILE_TEST
+	depends on X86 || ARM || ARM64 || COMPILE_TEST
 	---help---
 	  Say Y here to get to see options for platform support for
 	  Mellanox systems. This option alone does not add any kernel code.
@@ -34,4 +34,14 @@ config MLXREG_IO
 	  to system resets operation, system reset causes monitoring and some
 	  kinds of mux selection.
 
+config MLXBF_TMFIFO
+	tristate "Mellanox BlueField SoC TmFifo platform driver"
+	depends on ARM64
+	depends on ACPI
+	depends on VIRTIO_CONSOLE && VIRTIO_NET
+	help
+	  Say y here to enable TmFifo support. The TmFifo driver provides
+          platform driver support for the TmFifo which supports console
+          and networking based on the virtio framework.
+
 endif # MELLANOX_PLATFORM
diff --git a/drivers/platform/mellanox/Makefile b/drivers/platform/mellanox/Makefile
index 57074d9c..a229bda1 100644
--- a/drivers/platform/mellanox/Makefile
+++ b/drivers/platform/mellanox/Makefile
@@ -3,5 +3,6 @@
 # Makefile for linux/drivers/platform/mellanox
 # Mellanox Platform-Specific Drivers
 #
+obj-$(CONFIG_MLXBF_TMFIFO)	+= mlxbf-tmfifo.o
 obj-$(CONFIG_MLXREG_HOTPLUG)	+= mlxreg-hotplug.o
 obj-$(CONFIG_MLXREG_IO) += mlxreg-io.o
diff --git a/drivers/platform/mellanox/mlxbf-tmfifo-regs.h b/drivers/platform/mellanox/mlxbf-tmfifo-regs.h
new file mode 100644
index 0000000..e4f0d2e
--- /dev/null
+++ b/drivers/platform/mellanox/mlxbf-tmfifo-regs.h
@@ -0,0 +1,63 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Copyright (c) 2019, Mellanox Technologies. All rights reserved.
+ */
+
+#ifndef __MLXBF_TMFIFO_REGS_H__
+#define __MLXBF_TMFIFO_REGS_H__
+
+#include <linux/types.h>
+#include <linux/bits.h>
+
+#define MLXBF_TMFIFO_TX_DATA				0x00
+#define MLXBF_TMFIFO_TX_STS				0x08
+#define MLXBF_TMFIFO_TX_STS__LENGTH			0x0001
+#define MLXBF_TMFIFO_TX_STS__COUNT_SHIFT		0
+#define MLXBF_TMFIFO_TX_STS__COUNT_WIDTH		9
+#define MLXBF_TMFIFO_TX_STS__COUNT_RESET_VAL		0
+#define MLXBF_TMFIFO_TX_STS__COUNT_RMASK		GENMASK_ULL(8, 0)
+#define MLXBF_TMFIFO_TX_STS__COUNT_MASK			GENMASK_ULL(8, 0)
+#define MLXBF_TMFIFO_TX_CTL				0x10
+#define MLXBF_TMFIFO_TX_CTL__LENGTH			0x0001
+#define MLXBF_TMFIFO_TX_CTL__LWM_SHIFT			0
+#define MLXBF_TMFIFO_TX_CTL__LWM_WIDTH			8
+#define MLXBF_TMFIFO_TX_CTL__LWM_RESET_VAL		128
+#define MLXBF_TMFIFO_TX_CTL__LWM_RMASK			GENMASK_ULL(7, 0)
+#define MLXBF_TMFIFO_TX_CTL__LWM_MASK			GENMASK_ULL(7, 0)
+#define MLXBF_TMFIFO_TX_CTL__HWM_SHIFT			8
+#define MLXBF_TMFIFO_TX_CTL__HWM_WIDTH			8
+#define MLXBF_TMFIFO_TX_CTL__HWM_RESET_VAL		128
+#define MLXBF_TMFIFO_TX_CTL__HWM_RMASK			GENMASK_ULL(7, 0)
+#define MLXBF_TMFIFO_TX_CTL__HWM_MASK			GENMASK_ULL(15, 8)
+#define MLXBF_TMFIFO_TX_CTL__MAX_ENTRIES_SHIFT		32
+#define MLXBF_TMFIFO_TX_CTL__MAX_ENTRIES_WIDTH		9
+#define MLXBF_TMFIFO_TX_CTL__MAX_ENTRIES_RESET_VAL	256
+#define MLXBF_TMFIFO_TX_CTL__MAX_ENTRIES_RMASK		GENMASK_ULL(8, 0)
+#define MLXBF_TMFIFO_TX_CTL__MAX_ENTRIES_MASK		GENMASK_ULL(40, 32)
+#define MLXBF_TMFIFO_RX_DATA				0x00
+#define MLXBF_TMFIFO_RX_STS				0x08
+#define MLXBF_TMFIFO_RX_STS__LENGTH			0x0001
+#define MLXBF_TMFIFO_RX_STS__COUNT_SHIFT		0
+#define MLXBF_TMFIFO_RX_STS__COUNT_WIDTH		9
+#define MLXBF_TMFIFO_RX_STS__COUNT_RESET_VAL		0
+#define MLXBF_TMFIFO_RX_STS__COUNT_RMASK		GENMASK_ULL(8, 0)
+#define MLXBF_TMFIFO_RX_STS__COUNT_MASK			GENMASK_ULL(8, 0)
+#define MLXBF_TMFIFO_RX_CTL				0x10
+#define MLXBF_TMFIFO_RX_CTL__LENGTH			0x0001
+#define MLXBF_TMFIFO_RX_CTL__LWM_SHIFT			0
+#define MLXBF_TMFIFO_RX_CTL__LWM_WIDTH			8
+#define MLXBF_TMFIFO_RX_CTL__LWM_RESET_VAL		128
+#define MLXBF_TMFIFO_RX_CTL__LWM_RMASK			GENMASK_ULL(7, 0)
+#define MLXBF_TMFIFO_RX_CTL__LWM_MASK			GENMASK_ULL(7, 0)
+#define MLXBF_TMFIFO_RX_CTL__HWM_SHIFT			8
+#define MLXBF_TMFIFO_RX_CTL__HWM_WIDTH			8
+#define MLXBF_TMFIFO_RX_CTL__HWM_RESET_VAL		128
+#define MLXBF_TMFIFO_RX_CTL__HWM_RMASK			GENMASK_ULL(7, 0)
+#define MLXBF_TMFIFO_RX_CTL__HWM_MASK			GENMASK_ULL(15, 8)
+#define MLXBF_TMFIFO_RX_CTL__MAX_ENTRIES_SHIFT		32
+#define MLXBF_TMFIFO_RX_CTL__MAX_ENTRIES_WIDTH		9
+#define MLXBF_TMFIFO_RX_CTL__MAX_ENTRIES_RESET_VAL	256
+#define MLXBF_TMFIFO_RX_CTL__MAX_ENTRIES_RMASK		GENMASK_ULL(8, 0)
+#define MLXBF_TMFIFO_RX_CTL__MAX_ENTRIES_MASK		GENMASK_ULL(40, 32)
+
+#endif /* !defined(__MLXBF_TMFIFO_REGS_H__) */
diff --git a/drivers/platform/mellanox/mlxbf-tmfifo.c b/drivers/platform/mellanox/mlxbf-tmfifo.c
new file mode 100644
index 0000000..0a31ffa
--- /dev/null
+++ b/drivers/platform/mellanox/mlxbf-tmfifo.c
@@ -0,0 +1,1286 @@
+// SPDX-License-Identifier: GPL-2.0+
+/*
+ * Mellanox BlueField SoC TmFifo driver
+ *
+ * Copyright (C) 2019 Mellanox Technologies
+ */
+
+#include <linux/acpi.h>
+#include <linux/bitfield.h>
+#include <linux/circ_buf.h>
+#include <linux/efi.h>
+#include <linux/irq.h>
+#include <linux/module.h>
+#include <linux/mutex.h>
+#include <linux/platform_device.h>
+#include <linux/types.h>
+#include <linux/virtio_config.h>
+#include <linux/virtio_console.h>
+#include <linux/virtio_ids.h>
+#include <linux/virtio_net.h>
+#include <linux/virtio_ring.h>
+
+#include "mlxbf-tmfifo-regs.h"
+
+/* Vring size. */
+#define MLXBF_TMFIFO_VRING_SIZE			SZ_1K
+
+/* Console Tx buffer size. */
+#define MLXBF_TMFIFO_CON_TX_BUF_SIZE		SZ_32K
+
+/* Console Tx buffer reserved space. */
+#define MLXBF_TMFIFO_CON_TX_BUF_RSV_SIZE	8
+
+/* House-keeping timer interval. */
+#define MLXBF_TMFIFO_TIMER_INTERVAL		(HZ / 10)
+
+/* Virtual devices sharing the TM FIFO. */
+#define MLXBF_TMFIFO_VDEV_MAX		(VIRTIO_ID_CONSOLE + 1)
+
+/*
+ * Reserve 1/16 of TmFifo space, so console messages are not starved by
+ * the networking traffic.
+ */
+#define MLXBF_TMFIFO_RESERVE_RATIO		16
+
+/* Message with data needs at least two words (for header & data). */
+#define MLXBF_TMFIFO_DATA_MIN_WORDS		2
+
+struct mlxbf_tmfifo;
+
+/**
+ * mlxbf_tmfifo_vring - Structure of the TmFifo virtual ring
+ * @va: virtual address of the ring
+ * @dma: dma address of the ring
+ * @vq: pointer to the virtio virtqueue
+ * @desc: current descriptor of the pending packet
+ * @desc_head: head descriptor of the pending packet
+ * @cur_len: processed length of the current descriptor
+ * @rem_len: remaining length of the pending packet
+ * @pkt_len: total length of the pending packet
+ * @next_avail: next avail descriptor id
+ * @num: vring size (number of descriptors)
+ * @align: vring alignment size
+ * @index: vring index
+ * @vdev_id: vring virtio id (VIRTIO_ID_xxx)
+ * @fifo: pointer to the tmfifo structure
+ */
+struct mlxbf_tmfifo_vring {
+	void *va;
+	dma_addr_t dma;
+	struct virtqueue *vq;
+	struct vring_desc *desc;
+	struct vring_desc *desc_head;
+	int cur_len;
+	int rem_len;
+	u32 pkt_len;
+	u16 next_avail;
+	int num;
+	int align;
+	int index;
+	int vdev_id;
+	struct mlxbf_tmfifo *fifo;
+};
+
+/* Interrupt types. */
+enum {
+	MLXBF_TM_RX_LWM_IRQ,
+	MLXBF_TM_RX_HWM_IRQ,
+	MLXBF_TM_TX_LWM_IRQ,
+	MLXBF_TM_TX_HWM_IRQ,
+	MLXBF_TM_MAX_IRQ
+};
+
+/* Ring types (Rx & Tx). */
+enum {
+	MLXBF_TMFIFO_VRING_RX,
+	MLXBF_TMFIFO_VRING_TX,
+	MLXBF_TMFIFO_VRING_MAX
+};
+
+/**
+ * mlxbf_tmfifo_vdev - Structure of the TmFifo virtual device
+ * @vdev: virtio device, in which the vdev.id.device field has the
+ *        VIRTIO_ID_xxx id to distinguish the virtual device.
+ * @status: status of the device
+ * @features: supported features of the device
+ * @vrings: array of tmfifo vrings of this device
+ * @config.cons: virtual console config -
+ *               select if vdev.id.device is VIRTIO_ID_CONSOLE
+ * @config.net: virtual network config -
+ *              select if vdev.id.device is VIRTIO_ID_NET
+ * @tx_buf: tx buffer used to buffer data before writing into the FIFO
+ */
+struct mlxbf_tmfifo_vdev {
+	struct virtio_device vdev;
+	u8 status;
+	u64 features;
+	struct mlxbf_tmfifo_vring vrings[MLXBF_TMFIFO_VRING_MAX];
+	union {
+		struct virtio_console_config cons;
+		struct virtio_net_config net;
+	} config;
+	struct circ_buf tx_buf;
+};
+
+/**
+ * mlxbf_tmfifo_irq_info - Structure of the interrupt information
+ * @fifo: pointer to the tmfifo structure
+ * @irq: interrupt number
+ * @index: index into the interrupt array
+ */
+struct mlxbf_tmfifo_irq_info {
+	struct mlxbf_tmfifo *fifo;
+	int irq;
+	int index;
+};
+
+/**
+ * mlxbf_tmfifo - Structure of the TmFifo
+ * @vdev: array of the virtual devices running over the TmFifo
+ * @pdev: platform device
+ * @lock: lock to protect the TmFifo access
+ * @rx_base: mapped register base address for the Rx fifo
+ * @tx_base: mapped register base address for the Tx fifo
+ * @rx_fifo_size: number of entries of the Rx fifo
+ * @tx_fifo_size: number of entries of the Tx fifo
+ * @pend_events: pending bits for deferred events
+ * @irq_info: interrupt information
+ * @work: work struct for deferred process
+ * @timer: background timer
+ * @vring: Tx/Rx ring
+ * @spin_lock: spin lock
+ * @is_ready: ready flag
+ */
+struct mlxbf_tmfifo {
+	struct mlxbf_tmfifo_vdev *vdev[MLXBF_TMFIFO_VDEV_MAX];
+	struct platform_device *pdev;
+	struct mutex lock;		/* TmFifo lock */
+	void __iomem *rx_base;
+	void __iomem *tx_base;
+	int rx_fifo_size;
+	int tx_fifo_size;
+	unsigned long pend_events;
+	struct mlxbf_tmfifo_irq_info irq_info[MLXBF_TM_MAX_IRQ];
+	struct work_struct work;
+	struct timer_list timer;
+	struct mlxbf_tmfifo_vring *vring[2];
+	spinlock_t spin_lock;		/* spin lock */
+	bool is_ready;
+};
+
+/**
+ * mlxbf_tmfifo_msg_hdr - Structure of the TmFifo message header
+ * @type: message type
+ * @len: payload length
+ * @u: 64-bit union data
+ */
+union mlxbf_tmfifo_msg_hdr {
+	struct {
+		u8 type;
+		__be16 len;
+		u8 unused[5];
+	} __packed;
+	u64 data;
+};
+
+/*
+ * Default MAC.
+ * This MAC address will be read from EFI persistent variable if configured.
+ * It can also be reconfigured with standard Linux tools.
+ */
+static u8 mlxbf_tmfifo_net_default_mac[ETH_ALEN] = {
+	0x00, 0x1A, 0xCA, 0xFF, 0xFF, 0x01};
+
+/* EFI variable name of the MAC address. */
+static efi_char16_t mlxbf_tmfifo_efi_name[] = L"RshimMacAddr";
+
+/* Maximum L2 header length. */
+#define MLXBF_TMFIFO_NET_L2_OVERHEAD	36
+
+/* Supported virtio-net features. */
+#define MLXBF_TMFIFO_NET_FEATURES	(BIT_ULL(VIRTIO_NET_F_MTU) | \
+					 BIT_ULL(VIRTIO_NET_F_STATUS) | \
+					 BIT_ULL(VIRTIO_NET_F_MAC))
+
+#define mlxbf_vdev_to_tmfifo(dev)	\
+	container_of(dev, struct mlxbf_tmfifo_vdev, vdev)
+
+/* Allocate vrings for the fifo. */
+static int mlxbf_tmfifo_alloc_vrings(struct mlxbf_tmfifo *fifo,
+				     struct mlxbf_tmfifo_vdev *tm_vdev)
+{
+	struct mlxbf_tmfifo_vring *vring;
+	struct device *dev;
+	dma_addr_t dma;
+	int i, size;
+	void *va;
+
+	for (i = 0; i < ARRAY_SIZE(tm_vdev->vrings); i++) {
+		vring = &tm_vdev->vrings[i];
+		vring->fifo = fifo;
+		vring->num = MLXBF_TMFIFO_VRING_SIZE;
+		vring->align = SMP_CACHE_BYTES;
+		vring->index = i;
+		vring->vdev_id = tm_vdev->vdev.id.device;
+		dev = &tm_vdev->vdev.dev;
+
+		size = vring_size(vring->num, vring->align);
+		va = dma_alloc_coherent(dev->parent, size, &dma, GFP_KERNEL);
+		if (!va) {
+			dev_err(dev->parent, "dma_alloc_coherent failed\n");
+			return -ENOMEM;
+		}
+
+		vring->va = va;
+		vring->dma = dma;
+	}
+
+	return 0;
+}
+
+/* Free vrings of the fifo device. */
+static void mlxbf_tmfifo_free_vrings(struct mlxbf_tmfifo *fifo,
+				     struct mlxbf_tmfifo_vdev *tm_vdev)
+{
+	struct mlxbf_tmfifo_vring *vring;
+	int i, size;
+
+	for (i = 0; i < ARRAY_SIZE(tm_vdev->vrings); i++) {
+		vring = &tm_vdev->vrings[i];
+		if (vring->va) {
+			size = vring_size(vring->num, vring->align);
+			dma_free_coherent(tm_vdev->vdev.dev.parent, size,
+					  vring->va, vring->dma);
+			vring->va = NULL;
+			if (vring->vq) {
+				vring_del_virtqueue(vring->vq);
+				vring->vq = NULL;
+			}
+		}
+	}
+}
+
+/* Disable interrupts of the fifo device. */
+static void mlxbf_tmfifo_disable_irqs(struct mlxbf_tmfifo *fifo)
+{
+	int i, irq;
+
+	for (i = 0; i < MLXBF_TM_MAX_IRQ; i++) {
+		irq = fifo->irq_info[i].irq;
+		fifo->irq_info[i].irq = 0;
+		disable_irq(irq);
+	}
+}
+
+/* Interrupt handler. */
+static irqreturn_t mlxbf_tmfifo_irq_handler(int irq, void *arg)
+{
+	struct mlxbf_tmfifo_irq_info *irq_info = arg;
+
+	if (irq_info->index < MLXBF_TM_MAX_IRQ &&
+	    !test_and_set_bit(irq_info->index, &irq_info->fifo->pend_events))
+		schedule_work(&irq_info->fifo->work);
+
+	return IRQ_HANDLED;
+}
+
+/* Get the next packet descriptor from the vring. */
+static struct vring_desc *
+mlxbf_tmfifo_get_next_desc(struct mlxbf_tmfifo_vring *vring)
+{
+	const struct vring *vr = virtqueue_get_vring(vring->vq);
+	struct virtio_device *vdev = vring->vq->vdev;
+	unsigned int idx, head;
+
+	if (vring->next_avail == virtio16_to_cpu(vdev, vr->avail->idx))
+		return NULL;
+
+	idx = vring->next_avail % vr->num;
+	head = virtio16_to_cpu(vdev, vr->avail->ring[idx]);
+	if (WARN_ON(head >= vr->num))
+		return NULL;
+
+	vring->next_avail++;
+
+	return &vr->desc[head];
+}
+
+/* Release virtio descriptor. */
+static void mlxbf_tmfifo_release_desc(struct mlxbf_tmfifo_vring *vring,
+				      struct vring_desc *desc, u32 len)
+{
+	const struct vring *vr = virtqueue_get_vring(vring->vq);
+	struct virtio_device *vdev = vring->vq->vdev;
+	u16 idx, vr_idx;
+
+	vr_idx = virtio16_to_cpu(vdev, vr->used->idx);
+	idx = vr_idx % vr->num;
+	vr->used->ring[idx].id = cpu_to_virtio32(vdev, desc - vr->desc);
+	vr->used->ring[idx].len = cpu_to_virtio32(vdev, len);
+
+	/*
+	 * Virtio could poll and check the 'idx' to decide whether the desc is
+	 * done or not. Add a memory barrier here to make sure the update above
+	 * completes before updating the idx.
+	 */
+	mb();
+	vr->used->idx = cpu_to_virtio16(vdev, vr_idx + 1);
+}
+
+/* Get the total length of the descriptor chain. */
+static u32 mlxbf_tmfifo_get_pkt_len(struct mlxbf_tmfifo_vring *vring,
+				    struct vring_desc *desc)
+{
+	const struct vring *vr = virtqueue_get_vring(vring->vq);
+	struct virtio_device *vdev = vring->vq->vdev;
+	u32 len = 0, idx;
+
+	while (desc) {
+		len += virtio32_to_cpu(vdev, desc->len);
+		if (!(virtio16_to_cpu(vdev, desc->flags) & VRING_DESC_F_NEXT))
+			break;
+		idx = virtio16_to_cpu(vdev, desc->next);
+		desc = &vr->desc[idx];
+	}
+
+	return len;
+}
+
+static void mlxbf_tmfifo_release_pending_pkt(struct mlxbf_tmfifo_vring *vring)
+{
+	struct vring_desc *desc_head;
+	u32 len = 0;
+
+	if (vring->desc_head) {
+		desc_head = vring->desc_head;
+		len = vring->pkt_len;
+	} else {
+		desc_head = mlxbf_tmfifo_get_next_desc(vring);
+		if (desc_head)
+			len = mlxbf_tmfifo_get_pkt_len(vring, desc_head);
+	}
+
+	if (desc_head)
+		mlxbf_tmfifo_release_desc(vring, desc_head, len);
+
+	vring->pkt_len = 0;
+	vring->desc = NULL;
+	vring->desc_head = NULL;
+}
+
+static void mlxbf_tmfifo_init_net_desc(struct mlxbf_tmfifo_vring *vring,
+				       struct vring_desc *desc, bool is_rx)
+{
+	struct virtio_device *vdev = vring->vq->vdev;
+	struct virtio_net_hdr *net_hdr;
+
+	net_hdr = phys_to_virt(virtio64_to_cpu(vdev, desc->addr));
+	memset(net_hdr, 0, sizeof(*net_hdr));
+}
+
+/* Get and initialize the next packet. */
+static struct vring_desc *
+mlxbf_tmfifo_get_next_pkt(struct mlxbf_tmfifo_vring *vring, bool is_rx)
+{
+	struct vring_desc *desc;
+
+	desc = mlxbf_tmfifo_get_next_desc(vring);
+	if (desc && is_rx && vring->vdev_id == VIRTIO_ID_NET)
+		mlxbf_tmfifo_init_net_desc(vring, desc, is_rx);
+
+	vring->desc_head = desc;
+	vring->desc = desc;
+
+	return desc;
+}
+
+/* House-keeping timer. */
+static void mlxbf_tmfifo_timer(struct timer_list *arg)
+{
+	struct mlxbf_tmfifo *fifo = container_of(arg, struct mlxbf_tmfifo,
+						 timer);
+	int more;
+
+	more = !test_and_set_bit(MLXBF_TM_RX_HWM_IRQ, &fifo->pend_events) ||
+		    !test_and_set_bit(MLXBF_TM_TX_LWM_IRQ, &fifo->pend_events);
+
+	if (more)
+		schedule_work(&fifo->work);
+
+	mod_timer(&fifo->timer, jiffies + MLXBF_TMFIFO_TIMER_INTERVAL);
+}
+
+/* Copy one console packet into the output buffer. */
+static void mlxbf_tmfifo_console_output_one(struct mlxbf_tmfifo_vdev *cons,
+					    struct mlxbf_tmfifo_vring *vring,
+					    struct vring_desc *desc)
+{
+	const struct vring *vr = virtqueue_get_vring(vring->vq);
+	struct virtio_device *vdev = &cons->vdev;
+	u32 len, idx, seg;
+	void *addr;
+
+	while (desc) {
+		addr = phys_to_virt(virtio64_to_cpu(vdev, desc->addr));
+		len = virtio32_to_cpu(vdev, desc->len);
+
+		seg = CIRC_SPACE_TO_END(cons->tx_buf.head, cons->tx_buf.tail,
+					MLXBF_TMFIFO_CON_TX_BUF_SIZE);
+		if (len <= seg) {
+			memcpy(cons->tx_buf.buf + cons->tx_buf.head, addr, len);
+		} else {
+			memcpy(cons->tx_buf.buf + cons->tx_buf.head, addr, seg);
+			addr += seg;
+			memcpy(cons->tx_buf.buf, addr, len - seg);
+		}
+		cons->tx_buf.head = (cons->tx_buf.head + len) %
+			MLXBF_TMFIFO_CON_TX_BUF_SIZE;
+
+		if (!(virtio16_to_cpu(vdev, desc->flags) & VRING_DESC_F_NEXT))
+			break;
+		idx = virtio16_to_cpu(vdev, desc->next);
+		desc = &vr->desc[idx];
+	}
+}
+
+/* Copy console data into the output buffer. */
+static void mlxbf_tmfifo_console_output(struct mlxbf_tmfifo_vdev *cons,
+					struct mlxbf_tmfifo_vring *vring)
+{
+	struct vring_desc *desc;
+	u32 len, avail;
+
+	desc = mlxbf_tmfifo_get_next_desc(vring);
+	while (desc) {
+		/* Release the packet if not enough space. */
+		len = mlxbf_tmfifo_get_pkt_len(vring, desc);
+		avail = CIRC_SPACE(cons->tx_buf.head, cons->tx_buf.tail,
+				   MLXBF_TMFIFO_CON_TX_BUF_SIZE);
+		if (len + MLXBF_TMFIFO_CON_TX_BUF_RSV_SIZE > avail) {
+			mlxbf_tmfifo_release_desc(vring, desc, len);
+			break;
+		}
+
+		mlxbf_tmfifo_console_output_one(cons, vring, desc);
+		mlxbf_tmfifo_release_desc(vring, desc, len);
+		desc = mlxbf_tmfifo_get_next_desc(vring);
+	}
+}
+
+/* Get the number of available words in Rx FIFO for receiving. */
+static int mlxbf_tmfifo_get_rx_avail(struct mlxbf_tmfifo *fifo)
+{
+	u64 sts;
+
+	sts = readq(fifo->rx_base + MLXBF_TMFIFO_RX_STS);
+	return FIELD_GET(MLXBF_TMFIFO_RX_STS__COUNT_MASK, sts);
+}
+
+/* Get the number of available words in the TmFifo for sending. */
+static int mlxbf_tmfifo_get_tx_avail(struct mlxbf_tmfifo *fifo, int vdev_id)
+{
+	int tx_reserve;
+	u32 count;
+	u64 sts;
+
+	/* Reserve some room in FIFO for console messages. */
+	if (vdev_id == VIRTIO_ID_NET)
+		tx_reserve = fifo->tx_fifo_size / MLXBF_TMFIFO_RESERVE_RATIO;
+	else
+		tx_reserve = 1;
+
+	sts = readq(fifo->tx_base + MLXBF_TMFIFO_TX_STS);
+	count = FIELD_GET(MLXBF_TMFIFO_TX_STS__COUNT_MASK, sts);
+	return fifo->tx_fifo_size - tx_reserve - count;
+}
+
+/* Console Tx (move data from the output buffer into the TmFifo). */
+static void mlxbf_tmfifo_console_tx(struct mlxbf_tmfifo *fifo, int avail)
+{
+	union mlxbf_tmfifo_msg_hdr hdr;
+	struct mlxbf_tmfifo_vdev *cons;
+	unsigned long flags;
+	int size, seg;
+	void *addr;
+	u64 data;
+
+	/* Return if not enough space available. */
+	if (avail < MLXBF_TMFIFO_DATA_MIN_WORDS)
+		return;
+
+	cons = fifo->vdev[VIRTIO_ID_CONSOLE];
+	if (!cons || !cons->tx_buf.buf)
+		return;
+
+	/* Return if no data to send. */
+	size = CIRC_CNT(cons->tx_buf.head, cons->tx_buf.tail,
+			MLXBF_TMFIFO_CON_TX_BUF_SIZE);
+	if (size == 0)
+		return;
+
+	/* Adjust the size to available space. */
+	if (size + sizeof(hdr) > avail * sizeof(u64))
+		size = avail * sizeof(u64) - sizeof(hdr);
+
+	/* Write header. */
+	hdr.data = 0;
+	hdr.type = VIRTIO_ID_CONSOLE;
+	hdr.len = htons(size);
+	writeq(hdr.data, fifo->tx_base + MLXBF_TMFIFO_TX_DATA);
+
+	/* Use spin-lock to protect the 'cons->tx_buf'. */
+	spin_lock_irqsave(&fifo->spin_lock, flags);
+
+	while (size > 0) {
+		addr = cons->tx_buf.buf + cons->tx_buf.tail;
+
+		seg = CIRC_CNT_TO_END(cons->tx_buf.head, cons->tx_buf.tail,
+				      MLXBF_TMFIFO_CON_TX_BUF_SIZE);
+		if (seg >= sizeof(u64)) {
+			memcpy(&data, addr, sizeof(u64));
+		} else {
+			memcpy(&data, addr, seg);
+			memcpy((u8 *)&data + seg, cons->tx_buf.buf,
+			       sizeof(u64) - seg);
+		}
+		writeq(data, fifo->tx_base + MLXBF_TMFIFO_TX_DATA);
+
+		if (size >= sizeof(u64)) {
+			cons->tx_buf.tail = (cons->tx_buf.tail + sizeof(u64)) %
+				MLXBF_TMFIFO_CON_TX_BUF_SIZE;
+			size -= sizeof(u64);
+		} else {
+			cons->tx_buf.tail = (cons->tx_buf.tail + size) %
+				MLXBF_TMFIFO_CON_TX_BUF_SIZE;
+			size = 0;
+		}
+	}
+
+	spin_unlock_irqrestore(&fifo->spin_lock, flags);
+}
+
+/* Rx/Tx one word in the descriptor buffer. */
+static void mlxbf_tmfifo_rxtx_word(struct mlxbf_tmfifo_vring *vring,
+				   struct vring_desc *desc,
+				   bool is_rx, int len)
+{
+	struct virtio_device *vdev = vring->vq->vdev;
+	struct mlxbf_tmfifo *fifo = vring->fifo;
+	void *addr;
+	u64 data;
+
+	/* Get the buffer address of this desc. */
+	addr = phys_to_virt(virtio64_to_cpu(vdev, desc->addr));
+
+	/* Read a word from FIFO for Rx. */
+	if (is_rx)
+		data = readq(fifo->rx_base + MLXBF_TMFIFO_RX_DATA);
+
+	if (vring->cur_len + sizeof(u64) <= len) {
+		/* The whole word. */
+		if (is_rx)
+			memcpy(addr + vring->cur_len, &data, sizeof(u64));
+		else
+			memcpy(&data, addr + vring->cur_len, sizeof(u64));
+		vring->cur_len += sizeof(u64);
+	} else {
+		/* Leftover bytes. */
+		if (is_rx)
+			memcpy(addr + vring->cur_len, &data,
+			       len - vring->cur_len);
+		else
+			memcpy(&data, addr + vring->cur_len,
+			       len - vring->cur_len);
+		vring->cur_len = len;
+	}
+
+	/* Write the word into FIFO for Tx. */
+	if (!is_rx)
+		writeq(data, fifo->tx_base + MLXBF_TMFIFO_TX_DATA);
+}
+
+/*
+ * Rx/Tx packet header.
+ *
+ * In Rx case, the packet might be found to belong to a different vring since
+ * the TmFifo is shared by different services. In such case, the 'vring_change'
+ * flag is set.
+ */
+static void mlxbf_tmfifo_rxtx_header(struct mlxbf_tmfifo_vring *vring,
+				     struct vring_desc *desc,
+				     bool is_rx, bool *vring_change)
+{
+	struct mlxbf_tmfifo *fifo = vring->fifo;
+	struct virtio_net_config *config;
+	union mlxbf_tmfifo_msg_hdr hdr;
+	int vdev_id, hdr_len;
+
+	/* Read/Write packet header. */
+	if (is_rx) {
+		/* Drain one word from the FIFO. */
+		hdr.data = readq(fifo->rx_base + MLXBF_TMFIFO_RX_DATA);
+
+		/* Skip the length 0 packets (keepalive). */
+		if (hdr.len == 0)
+			return;
+
+		/* Check packet type. */
+		if (hdr.type == VIRTIO_ID_NET) {
+			vdev_id = VIRTIO_ID_NET;
+			hdr_len = sizeof(struct virtio_net_hdr);
+			config = &fifo->vdev[vdev_id]->config.net;
+			if (ntohs(hdr.len) > config->mtu +
+			    MLXBF_TMFIFO_NET_L2_OVERHEAD)
+				return;
+		} else {
+			vdev_id = VIRTIO_ID_CONSOLE;
+			hdr_len = 0;
+		}
+
+		/*
+		 * Check whether the new packet still belongs to this vring.
+		 * If not, update the pkt_len of the new vring.
+		 */
+		if (vdev_id != vring->vdev_id) {
+			struct mlxbf_tmfifo_vdev *tm_dev2 = fifo->vdev[vdev_id];
+
+			if (!tm_dev2)
+				return;
+			vring->desc = desc;
+			vring = &tm_dev2->vrings[MLXBF_TMFIFO_VRING_RX];
+			*vring_change = true;
+		}
+		vring->pkt_len = ntohs(hdr.len) + hdr_len;
+	} else {
+		/* Network virtio has an extra header. */
+		hdr_len = (vring->vdev_id == VIRTIO_ID_NET) ?
+			   sizeof(struct virtio_net_hdr) : 0;
+		vring->pkt_len = mlxbf_tmfifo_get_pkt_len(vring, desc);
+		hdr.data = 0;
+		hdr.type = (vring->vdev_id == VIRTIO_ID_NET) ?
+			    VIRTIO_ID_NET : VIRTIO_ID_CONSOLE;
+		hdr.len = htons(vring->pkt_len - hdr_len);
+		writeq(hdr.data, fifo->tx_base + MLXBF_TMFIFO_TX_DATA);
+	}
+
+	vring->cur_len = hdr_len;
+	vring->rem_len = vring->pkt_len;
+	fifo->vring[is_rx] = vring;
+}
+
+/*
+ * Rx/Tx one descriptor.
+ *
+ * Return true to indicate more data available.
+ */
+static bool mlxbf_tmfifo_rxtx_one_desc(struct mlxbf_tmfifo_vring *vring,
+				       bool is_rx, int *avail)
+{
+	const struct vring *vr = virtqueue_get_vring(vring->vq);
+	struct mlxbf_tmfifo *fifo = vring->fifo;
+	struct virtio_device *vdev;
+	bool vring_change = false;
+	struct vring_desc *desc;
+	unsigned long flags;
+	u32 len, idx;
+
+	vdev = &fifo->vdev[vring->vdev_id]->vdev;
+
+	/* Get the descriptor of the next packet. */
+	if (!vring->desc) {
+		desc = mlxbf_tmfifo_get_next_pkt(vring, is_rx);
+		if (!desc)
+			return false;
+	} else {
+		desc = vring->desc;
+	}
+
+	/* Beginning of a packet. Start to Rx/Tx packet header. */
+	if (vring->pkt_len == 0) {
+		mlxbf_tmfifo_rxtx_header(vring, desc, is_rx, &vring_change);
+		(*avail)--;
+
+		/* Return if new packet is for another ring. */
+		if (vring_change)
+			return false;
+		goto mlxbf_tmfifo_desc_done;
+	}
+
+	/* Get the length of this desc. */
+	len = virtio32_to_cpu(vdev, desc->len);
+	if (len > vring->rem_len)
+		len = vring->rem_len;
+
+	/* Rx/Tx one word (8 bytes) if not done. */
+	if (vring->cur_len < len) {
+		mlxbf_tmfifo_rxtx_word(vring, desc, is_rx, len);
+		(*avail)--;
+	}
+
+	/* Check again whether it's done. */
+	if (vring->cur_len == len) {
+		vring->cur_len = 0;
+		vring->rem_len -= len;
+
+		/* Get the next desc on the chain. */
+		if (vring->rem_len > 0 &&
+		    (virtio16_to_cpu(vdev, desc->flags) & VRING_DESC_F_NEXT)) {
+			idx = virtio16_to_cpu(vdev, desc->next);
+			desc = &vr->desc[idx];
+			goto mlxbf_tmfifo_desc_done;
+		}
+
+		/* Done and release the pending packet. */
+		mlxbf_tmfifo_release_pending_pkt(vring);
+		desc = NULL;
+		fifo->vring[is_rx] = NULL;
+
+		/* Notify upper layer that packet is done. */
+		spin_lock_irqsave(&fifo->spin_lock, flags);
+		vring_interrupt(0, vring->vq);
+		spin_unlock_irqrestore(&fifo->spin_lock, flags);
+	}
+
+mlxbf_tmfifo_desc_done:
+	/* Save the current desc. */
+	vring->desc = desc;
+
+	return true;
+}
+
+/* Rx & Tx processing of a queue. */
+static void mlxbf_tmfifo_rxtx(struct mlxbf_tmfifo_vring *vring, bool is_rx)
+{
+	int avail = 0, devid = vring->vdev_id;
+	struct mlxbf_tmfifo *fifo;
+	bool more;
+
+	fifo = vring->fifo;
+
+	/* Return if vdev is not ready. */
+	if (!fifo->vdev[devid])
+		return;
+
+	/* Return if another vring is running. */
+	if (fifo->vring[is_rx] && fifo->vring[is_rx] != vring)
+		return;
+
+	/* Only handle console and network for now. */
+	if (WARN_ON(devid != VIRTIO_ID_NET && devid != VIRTIO_ID_CONSOLE))
+		return;
+
+	do {
+		/* Get available FIFO space. */
+		if (avail == 0) {
+			if (is_rx)
+				avail = mlxbf_tmfifo_get_rx_avail(fifo);
+			else
+				avail = mlxbf_tmfifo_get_tx_avail(fifo, devid);
+			if (avail <= 0)
+				break;
+		}
+
+		/* Console output always comes from the Tx buffer. */
+		if (!is_rx && devid == VIRTIO_ID_CONSOLE) {
+			mlxbf_tmfifo_console_tx(fifo, avail);
+			break;
+		}
+
+		/* Handle one descriptor. */
+		more = mlxbf_tmfifo_rxtx_one_desc(vring, is_rx, &avail);
+	} while (more);
+}
+
+/* Handle Rx or Tx queues. */
+static void mlxbf_tmfifo_work_rxtx(struct mlxbf_tmfifo *fifo, int queue_id,
+				   int irq_id, bool is_rx)
+{
+	struct mlxbf_tmfifo_vdev *tm_vdev;
+	struct mlxbf_tmfifo_vring *vring;
+	int i;
+
+	if (!test_and_clear_bit(irq_id, &fifo->pend_events) ||
+	    !fifo->irq_info[irq_id].irq)
+		return;
+
+	for (i = 0; i < MLXBF_TMFIFO_VDEV_MAX; i++) {
+		tm_vdev = fifo->vdev[i];
+		if (tm_vdev) {
+			vring = &tm_vdev->vrings[queue_id];
+			if (vring->vq)
+				mlxbf_tmfifo_rxtx(vring, is_rx);
+		}
+	}
+}
+
+/* Work handler for Rx and Tx case. */
+static void mlxbf_tmfifo_work_handler(struct work_struct *work)
+{
+	struct mlxbf_tmfifo *fifo;
+
+	fifo = container_of(work, struct mlxbf_tmfifo, work);
+	if (!fifo->is_ready)
+		return;
+
+	mutex_lock(&fifo->lock);
+
+	/* Tx (Send data to the TmFifo). */
+	mlxbf_tmfifo_work_rxtx(fifo, MLXBF_TMFIFO_VRING_TX,
+			       MLXBF_TM_TX_LWM_IRQ, false);
+
+	/* Rx (Receive data from the TmFifo). */
+	mlxbf_tmfifo_work_rxtx(fifo, MLXBF_TMFIFO_VRING_RX,
+			       MLXBF_TM_RX_HWM_IRQ, true);
+
+	mutex_unlock(&fifo->lock);
+}
+
+/* The notify function is called when new buffers are posted. */
+static bool mlxbf_tmfifo_virtio_notify(struct virtqueue *vq)
+{
+	struct mlxbf_tmfifo_vring *vring = vq->priv;
+	struct mlxbf_tmfifo_vdev *tm_vdev;
+	struct mlxbf_tmfifo *fifo;
+	unsigned long flags;
+
+	fifo = vring->fifo;
+
+	/*
+	 * Virtio maintains vrings in pairs, even number ring for Rx
+	 * and odd number ring for Tx.
+	 */
+	if (!(vring->index & BIT(0))) {
+		if (test_and_set_bit(MLXBF_TM_RX_HWM_IRQ, &fifo->pend_events))
+			return true;
+	} else {
+		/*
+		 * Console could make blocking call with interrupts disabled.
+		 * In such case, the vring needs to be served right away. For
+		 * other cases, just set the TX LWM bit to start Tx in the
+		 * worker handler.
+		 */
+		if (vring->vdev_id == VIRTIO_ID_CONSOLE) {
+			spin_lock_irqsave(&fifo->spin_lock, flags);
+			tm_vdev = fifo->vdev[VIRTIO_ID_CONSOLE];
+			mlxbf_tmfifo_console_output(tm_vdev, vring);
+			spin_unlock_irqrestore(&fifo->spin_lock, flags);
+		} else if (test_and_set_bit(MLXBF_TM_TX_LWM_IRQ,
+					    &fifo->pend_events)) {
+			return true;
+		}
+	}
+
+	schedule_work(&fifo->work);
+
+	return true;
+}
+
+/* Get the array of feature bits for this device. */
+static u64 mlxbf_tmfifo_virtio_get_features(struct virtio_device *vdev)
+{
+	struct mlxbf_tmfifo_vdev *tm_vdev = mlxbf_vdev_to_tmfifo(vdev);
+
+	return tm_vdev->features;
+}
+
+/* Confirm device features to use. */
+static int mlxbf_tmfifo_virtio_finalize_features(struct virtio_device *vdev)
+{
+	struct mlxbf_tmfifo_vdev *tm_vdev = mlxbf_vdev_to_tmfifo(vdev);
+
+	tm_vdev->features = vdev->features;
+
+	return 0;
+}
+
+/* Free virtqueues found by find_vqs(). */
+static void mlxbf_tmfifo_virtio_del_vqs(struct virtio_device *vdev)
+{
+	struct mlxbf_tmfifo_vdev *tm_vdev = mlxbf_vdev_to_tmfifo(vdev);
+	struct mlxbf_tmfifo_vring *vring;
+	struct virtqueue *vq;
+	int i;
+
+	for (i = 0; i < ARRAY_SIZE(tm_vdev->vrings); i++) {
+		vring = &tm_vdev->vrings[i];
+
+		/* Release the pending packet. */
+		if (vring->desc)
+			mlxbf_tmfifo_release_pending_pkt(vring);
+		vq = vring->vq;
+		if (vq) {
+			vring->vq = NULL;
+			vring_del_virtqueue(vq);
+		}
+	}
+}
+
+/* Create and initialize the virtual queues. */
+static int mlxbf_tmfifo_virtio_find_vqs(struct virtio_device *vdev,
+					unsigned int nvqs,
+					struct virtqueue *vqs[],
+					vq_callback_t *callbacks[],
+					const char * const names[],
+					const bool *ctx,
+					struct irq_affinity *desc)
+{
+	struct mlxbf_tmfifo_vdev *tm_vdev = mlxbf_vdev_to_tmfifo(vdev);
+	struct mlxbf_tmfifo_vring *vring;
+	struct virtqueue *vq;
+	int i, ret, size;
+
+	if (nvqs > ARRAY_SIZE(tm_vdev->vrings))
+		return -EINVAL;
+
+	for (i = 0; i < nvqs; ++i) {
+		if (!names[i]) {
+			ret = -EINVAL;
+			goto error;
+		}
+		vring = &tm_vdev->vrings[i];
+
+		/* zero vring */
+		size = vring_size(vring->num, vring->align);
+		memset(vring->va, 0, size);
+		vq = vring_new_virtqueue(i, vring->num, vring->align, vdev,
+					 false, false, vring->va,
+					 mlxbf_tmfifo_virtio_notify,
+					 callbacks[i], names[i]);
+		if (!vq) {
+			dev_err(&vdev->dev, "vring_new_virtqueue failed\n");
+			ret = -ENOMEM;
+			goto error;
+		}
+
+		vqs[i] = vq;
+		vring->vq = vq;
+		vq->priv = vring;
+	}
+
+	return 0;
+
+error:
+	mlxbf_tmfifo_virtio_del_vqs(vdev);
+	return ret;
+}
+
+/* Read the status byte. */
+static u8 mlxbf_tmfifo_virtio_get_status(struct virtio_device *vdev)
+{
+	struct mlxbf_tmfifo_vdev *tm_vdev = mlxbf_vdev_to_tmfifo(vdev);
+
+	return tm_vdev->status;
+}
+
+/* Write the status byte. */
+static void mlxbf_tmfifo_virtio_set_status(struct virtio_device *vdev,
+					   u8 status)
+{
+	struct mlxbf_tmfifo_vdev *tm_vdev = mlxbf_vdev_to_tmfifo(vdev);
+
+	tm_vdev->status = status;
+}
+
+/* Reset the device. Not much here for now. */
+static void mlxbf_tmfifo_virtio_reset(struct virtio_device *vdev)
+{
+	struct mlxbf_tmfifo_vdev *tm_vdev = mlxbf_vdev_to_tmfifo(vdev);
+
+	tm_vdev->status = 0;
+}
+
+/* Read the value of a configuration field. */
+static void mlxbf_tmfifo_virtio_get(struct virtio_device *vdev,
+				    unsigned int offset,
+				    void *buf,
+				    unsigned int len)
+{
+	struct mlxbf_tmfifo_vdev *tm_vdev = mlxbf_vdev_to_tmfifo(vdev);
+
+	if (offset + len > sizeof(tm_vdev->config))
+		return;
+
+	memcpy(buf, (u8 *)&tm_vdev->config + offset, len);
+}
+
+/* Write the value of a configuration field. */
+static void mlxbf_tmfifo_virtio_set(struct virtio_device *vdev,
+				    unsigned int offset,
+				    const void *buf,
+				    unsigned int len)
+{
+	struct mlxbf_tmfifo_vdev *tm_vdev = mlxbf_vdev_to_tmfifo(vdev);
+
+	if (offset + len > sizeof(tm_vdev->config))
+		return;
+
+	memcpy((u8 *)&tm_vdev->config + offset, buf, len);
+}
+
+/*
+ * Nothing to do for now. This function is needed to avoid warnings
+ * when the device is released in device_release().
+ */
+static void tmfifo_virtio_dev_release(struct device *dev)
+{
+}
+
+/* Virtio config operations. */
+static const struct virtio_config_ops mlxbf_tmfifo_virtio_config_ops = {
+	.get_features = mlxbf_tmfifo_virtio_get_features,
+	.finalize_features = mlxbf_tmfifo_virtio_finalize_features,
+	.find_vqs = mlxbf_tmfifo_virtio_find_vqs,
+	.del_vqs = mlxbf_tmfifo_virtio_del_vqs,
+	.reset = mlxbf_tmfifo_virtio_reset,
+	.set_status = mlxbf_tmfifo_virtio_set_status,
+	.get_status = mlxbf_tmfifo_virtio_get_status,
+	.get = mlxbf_tmfifo_virtio_get,
+	.set = mlxbf_tmfifo_virtio_set,
+};
+
+/* Create vdev type in a tmfifo. */
+static int mlxbf_tmfifo_create_vdev(struct device *dev,
+				    struct mlxbf_tmfifo *fifo,
+				    int vdev_id, u64 features,
+				    void *config, u32 size)
+{
+	struct mlxbf_tmfifo_vdev *tm_vdev;
+	int ret;
+
+	mutex_lock(&fifo->lock);
+
+	tm_vdev = fifo->vdev[vdev_id];
+	if (tm_vdev) {
+		dev_err(dev, "vdev %d already exists\n", vdev_id);
+		ret = -EEXIST;
+		goto fail;
+	}
+
+	tm_vdev = devm_kzalloc(dev, sizeof(*tm_vdev), GFP_KERNEL);
+	if (!tm_vdev) {
+		ret = -ENOMEM;
+		goto fail;
+	}
+
+	tm_vdev->vdev.id.device = vdev_id;
+	tm_vdev->vdev.config = &mlxbf_tmfifo_virtio_config_ops;
+	tm_vdev->vdev.dev.parent = &fifo->pdev->dev;
+	tm_vdev->vdev.dev.release = tmfifo_virtio_dev_release;
+	tm_vdev->features = features;
+	if (config)
+		memcpy(&tm_vdev->config, config, size);
+
+	if (mlxbf_tmfifo_alloc_vrings(fifo, tm_vdev)) {
+		dev_err(dev, "unable to allocate vring\n");
+		ret = -ENOMEM;
+		goto vdev_fail;
+	}
+
+	/* Allocate an output buffer for the console device. */
+	if (vdev_id == VIRTIO_ID_CONSOLE)
+		tm_vdev->tx_buf.buf = devm_kmalloc(dev,
+						   MLXBF_TMFIFO_CON_TX_BUF_SIZE,
+						   GFP_KERNEL);
+	fifo->vdev[vdev_id] = tm_vdev;
+
+	/* Register the virtio device. */
+	ret = register_virtio_device(&tm_vdev->vdev);
+	if (ret) {
+		dev_err(&fifo->pdev->dev, "register_virtio_device failed\n");
+		goto vdev_fail;
+	}
+
+	mutex_unlock(&fifo->lock);
+	return 0;
+
+vdev_fail:
+	mlxbf_tmfifo_free_vrings(fifo, tm_vdev);
+	fifo->vdev[vdev_id] = NULL;
+fail:
+	mutex_unlock(&fifo->lock);
+	return ret;
+}
+
+/* Delete vdev type from a tmfifo. */
+static int mlxbf_tmfifo_delete_vdev(struct mlxbf_tmfifo *fifo, int vdev_id)
+{
+	struct mlxbf_tmfifo_vdev *tm_vdev;
+
+	mutex_lock(&fifo->lock);
+
+	/* Unregister vdev. */
+	tm_vdev = fifo->vdev[vdev_id];
+	if (tm_vdev) {
+		unregister_virtio_device(&tm_vdev->vdev);
+		mlxbf_tmfifo_free_vrings(fifo, tm_vdev);
+		fifo->vdev[vdev_id] = NULL;
+	}
+
+	mutex_unlock(&fifo->lock);
+
+	return 0;
+}
+
+/* Read the configured network MAC address from efi variable. */
+static void mlxbf_tmfifo_get_cfg_mac(u8 *mac)
+{
+	efi_guid_t guid = EFI_GLOBAL_VARIABLE_GUID;
+	unsigned long size = ETH_ALEN;
+	efi_status_t status;
+	u8 buf[ETH_ALEN];
+
+	status = efi.get_variable(mlxbf_tmfifo_efi_name, &guid, NULL, &size,
+				  buf);
+	if (status == EFI_SUCCESS && size == ETH_ALEN)
+		ether_addr_copy(mac, buf);
+	else
+		memcpy(mac, mlxbf_tmfifo_net_default_mac, ETH_ALEN);
+}
+
+/* Set TmFifo thresolds which is used to trigger interrupts. */
+static void mlxbf_tmfifo_set_threshold(struct mlxbf_tmfifo *fifo)
+{
+	u64 ctl;
+
+	/* Get Tx FIFO size and set the low/high watermark. */
+	ctl = readq(fifo->tx_base + MLXBF_TMFIFO_TX_CTL);
+	fifo->tx_fifo_size =
+		FIELD_GET(MLXBF_TMFIFO_TX_CTL__MAX_ENTRIES_MASK, ctl);
+	ctl = (ctl & ~MLXBF_TMFIFO_TX_CTL__LWM_MASK) |
+		FIELD_PREP(MLXBF_TMFIFO_TX_CTL__LWM_MASK,
+			   fifo->tx_fifo_size / 2);
+	ctl = (ctl & ~MLXBF_TMFIFO_TX_CTL__HWM_MASK) |
+		FIELD_PREP(MLXBF_TMFIFO_TX_CTL__HWM_MASK,
+			   fifo->tx_fifo_size - 1);
+	writeq(ctl, fifo->tx_base + MLXBF_TMFIFO_TX_CTL);
+
+	/* Get Rx FIFO size and set the low/high watermark. */
+	ctl = readq(fifo->rx_base + MLXBF_TMFIFO_RX_CTL);
+	fifo->rx_fifo_size =
+		FIELD_GET(MLXBF_TMFIFO_RX_CTL__MAX_ENTRIES_MASK, ctl);
+	ctl = (ctl & ~MLXBF_TMFIFO_RX_CTL__LWM_MASK) |
+		FIELD_PREP(MLXBF_TMFIFO_RX_CTL__LWM_MASK, 0);
+	ctl = (ctl & ~MLXBF_TMFIFO_RX_CTL__HWM_MASK) |
+		FIELD_PREP(MLXBF_TMFIFO_RX_CTL__HWM_MASK, 1);
+	writeq(ctl, fifo->rx_base + MLXBF_TMFIFO_RX_CTL);
+}
+
+static void mlxbf_tmfifo_cleanup(struct mlxbf_tmfifo *fifo)
+{
+	int i;
+
+	fifo->is_ready = false;
+	del_timer_sync(&fifo->timer);
+	mlxbf_tmfifo_disable_irqs(fifo);
+	cancel_work_sync(&fifo->work);
+	for (i = 0; i < MLXBF_TMFIFO_VDEV_MAX; i++)
+		mlxbf_tmfifo_delete_vdev(fifo, i);
+}
+
+/* Probe the TMFIFO. */
+static int mlxbf_tmfifo_probe(struct platform_device *pdev)
+{
+	struct virtio_net_config net_config;
+	struct mlxbf_tmfifo *fifo;
+	struct resource *res;
+	int i, ret;
+
+	fifo = devm_kzalloc(&pdev->dev, sizeof(*fifo), GFP_KERNEL);
+	if (!fifo)
+		return -ENOMEM;
+
+	spin_lock_init(&fifo->spin_lock);
+	INIT_WORK(&fifo->work, mlxbf_tmfifo_work_handler);
+	mutex_init(&fifo->lock);
+
+	/* Get the resource of the Rx FIFO. */
+	res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
+	fifo->rx_base = devm_ioremap_resource(&pdev->dev, res);
+	if (IS_ERR(fifo->rx_base))
+		return PTR_ERR(fifo->rx_base);
+
+	/* Get the resource of the Tx FIFO. */
+	res = platform_get_resource(pdev, IORESOURCE_MEM, 1);
+	fifo->tx_base = devm_ioremap_resource(&pdev->dev, res);
+	if (IS_ERR(fifo->tx_base))
+		return PTR_ERR(fifo->tx_base);
+
+	fifo->pdev = pdev;
+	platform_set_drvdata(pdev, fifo);
+
+	timer_setup(&fifo->timer, mlxbf_tmfifo_timer, 0);
+
+	for (i = 0; i < MLXBF_TM_MAX_IRQ; i++) {
+		fifo->irq_info[i].index = i;
+		fifo->irq_info[i].fifo = fifo;
+		fifo->irq_info[i].irq = platform_get_irq(pdev, i);
+		ret = devm_request_irq(&pdev->dev, fifo->irq_info[i].irq,
+				       mlxbf_tmfifo_irq_handler, 0,
+				       "tmfifo", &fifo->irq_info[i]);
+		if (ret) {
+			dev_err(&pdev->dev, "devm_request_irq failed\n");
+			fifo->irq_info[i].irq = 0;
+			return ret;
+		}
+	}
+
+	mlxbf_tmfifo_set_threshold(fifo);
+
+	/* Create the console vdev. */
+	ret = mlxbf_tmfifo_create_vdev(&pdev->dev, fifo, VIRTIO_ID_CONSOLE, 0,
+				       NULL, 0);
+	if (ret)
+		goto fail;
+
+	/* Create the network vdev. */
+	memset(&net_config, 0, sizeof(net_config));
+	net_config.mtu = ETH_DATA_LEN;
+	net_config.status = VIRTIO_NET_S_LINK_UP;
+	mlxbf_tmfifo_get_cfg_mac(net_config.mac);
+	ret = mlxbf_tmfifo_create_vdev(&pdev->dev, fifo, VIRTIO_ID_NET,
+				       MLXBF_TMFIFO_NET_FEATURES, &net_config,
+				       sizeof(net_config));
+	if (ret)
+		goto fail;
+
+	mod_timer(&fifo->timer, jiffies + MLXBF_TMFIFO_TIMER_INTERVAL);
+
+	fifo->is_ready = true;
+	return 0;
+
+fail:
+	mlxbf_tmfifo_cleanup(fifo);
+	return ret;
+}
+
+/* Device remove function. */
+static int mlxbf_tmfifo_remove(struct platform_device *pdev)
+{
+	struct mlxbf_tmfifo *fifo = platform_get_drvdata(pdev);
+
+	mlxbf_tmfifo_cleanup(fifo);
+
+	return 0;
+}
+
+static const struct acpi_device_id mlxbf_tmfifo_acpi_match[] = {
+	{ "MLNXBF01", 0 },
+	{}
+};
+MODULE_DEVICE_TABLE(acpi, mlxbf_tmfifo_acpi_match);
+
+static struct platform_driver mlxbf_tmfifo_driver = {
+	.probe = mlxbf_tmfifo_probe,
+	.remove = mlxbf_tmfifo_remove,
+	.driver = {
+		.name = "bf-tmfifo",
+		.acpi_match_table = mlxbf_tmfifo_acpi_match,
+	},
+};
+
+module_platform_driver(mlxbf_tmfifo_driver);
+
+MODULE_DESCRIPTION("Mellanox BlueField SoC TmFifo Driver");
+MODULE_LICENSE("GPL v2");
+MODULE_AUTHOR("Mellanox Technologies");
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v12] platform/mellanox: Add TmFifo driver for Mellanox BlueField Soc
       [not found] <b143b40446c1870fb8d422b364ead95d54552be9.1527264077.git.lsun@mellanox.com>
                   ` (6 preceding siblings ...)
  2019-03-26 21:13 ` Liming Sun
@ 2019-03-28 19:56 ` Liming Sun
  2019-04-04 19:36 ` [PATCH v13] " Liming Sun
                   ` (3 subsequent siblings)
  11 siblings, 0 replies; 30+ messages in thread
From: Liming Sun @ 2019-03-28 19:56 UTC (permalink / raw)
  To: David Woods, Andy Shevchenko, Darren Hart, Vadim Pasternak
  Cc: Liming Sun, linux-kernel, platform-driver-x86

This commit adds the TmFifo platform driver for Mellanox BlueField
Soc. TmFifo is a shared FIFO which enables external host machine
to exchange data with the SoC via USB or PCIe. The driver is based
on virtio framework and has console and network access enabled.

Reviewed-by: Vadim Pasternak <vadimp@mellanox.com>
Signed-off-by: Liming Sun <lsun@mellanox.com>
---
v11->v12:
    Fixed the two unsolved comments from v11.
    - "Change macro mlxbf_vdev_to_tmfifo() to one line"
      Done. Seems not hard.
    - "Is it appropriate use of devm_* for 'tm_vdev = devm_kzalloc'"
      Yes, understand the comment now. The tmfifo is fixed, but the
      vdev is dynamic. Use kzalloc() instead, and free the device
      in the release callback which is the right place for it.
v10->v11:
    Fixes for comments from Andy:
    - Use GENMASK_ULL() instead of GENMASK() in mlxbf-tmfifo-regs.h
    - Removed the cpu_to_le64()/le64_to_cpu() conversion since
      readq()/writeq() already takes care of it.
    - Remove the "if (irq)" check in mlxbf_tmfifo_disable_irqs().
    - Add "u32 count" temp variable in mlxbf_tmfifo_get_tx_avail().
    - Clean up mlxbf_tmfifo_get_cfg_mac(), use ETH_ALEN instead of
      value 6.
    - Change the tx_buf to use Linux existing 'struct circ_buf'.
    Comment not applied:
    - "Change macro mlxbf_vdev_to_tmfifo() to one line"
      Couldn't fit in one line with 80 chracters
    - "Is it appropriate use of devm_* for 'tm_vdev = devm_kzalloc'"
      This is SoC, the device won't be closed or detached.
      The only case is when the driver is unloaded. So it appears
      ok to use devm_kzalloc() since it's allocated during probe()
      and released during module unload.
    Comments from Vadim: OK
v9->v10:
    Fixes for comments from Andy:
    - Use devm_ioremap_resource() instead of devm_ioremap().
    - Use kernel-doc comments.
    - Keep Makefile contents sorted.
    - Use same fixed format for offsets.
    - Use SZ_1K/SZ_32K instead of 1024/23*1024.
    - Remove unnecessary comments.
    - Use one style for max numbers.
    - More comments for mlxbf_tmfifo_vdev and mlxbf_tmfifo_data_64bit.
    - Use globally defined MTU instead of new definition.
    - Remove forward declaration of mlxbf_tmfifo_remove().
    - Remove PAGE_ALIGN() for dma_alloc_coherent)().
    - Remove the cast of "struct vring *".
    - Check return result of test_and_set_bit().
    - Add a macro mlxbt_vdev_to_tmfifo().
    - Several other minor coding style comments.
    Comment not applied:
    - "Shouldn't be rather helper in EFI lib in kernel"
      Looks like efi.get_variable() is the way I found in the kernel
      tree.
    - "this one is not protected anyhow? Potential race condition"
      In mlxbf_tmfifo_console_tx(), the spin-lock is used to protect the
      'tx_buf' only, not the FIFO writes. So there is no race condition.
    - "Is __packed needed in mlxbf_tmfifo_msg_hdr".
      Yes, it is needed to make sure the structure is 8 bytes.
    Fixes for comments from Vadim:
    - Use tab in mlxbf-tmfifo-regs.h
    - Use kernel-doc comments for struct mlxbf_tmfifo_msg_hdr and
      mlxbf_tmfifo_irq_info as well.
    - Use _MAX instead of _CNT in the macro definition to be consistent.
    - Fix the MODULE_LICENSE.
    - Use BIT_ULL() instead of BIT().
    - Remove argument of 'avail' for mlxbf_tmfifo_rxtx_header() and
      mlxbf_tmfifo_rxtx_word()
    - Revise logic in mlxbf_tmfifo_rxtx_one_desc() to remove the
      WARN_ON().
    - Change "union mlxbf_tmfifo_u64 u" to "union mlxbf_tmfifo_u64 buf"
      in mlxbf_tmfifo_rxtx_word().
    - Change date type of vring_change from 'int' to 'bool'.
    - Remove the blank lines after Signed-off.
    - Don’t use declaration in the middle.
    - Make the network header initialization in some more elegant way.
    - Change label done to mlxbf_tmfifo_desc_done.
    - Remove some unnecessary comments, and several other misc coding
      style comments.
    - Simplify code logic in mlxbf_tmfifo_virtio_notify()
    New changes by Liming:
    - Simplify the Rx/Tx function arguments to make it more readable.
v8->v9:
    Fixes for comments from Andy:
    - Use modern devm_xxx() API instead.
    Fixes for comments from Vadim:
    - Split the Rx/Tx function into smaller funcitons.
    - File name, copyright information.
    - Function and variable name conversion.
    - Local variable and indent coding styles.
    - Remove unnecessary 'inline' declarations.
    - Use devm_xxx() APIs.
    - Move the efi_char16_t MAC address definition to global.
    - Fix warnings reported by 'checkpatch --strict'.
    - Fix warnings reported by 'make CF="-D__CHECK_ENDIAN__"'.
    - Change select VIRTIO_xxx to depends on  VIRTIO_ in Kconfig.
    - Merge mlxbf_tmfifo_vdev_tx_buf_push() and
      mlxbf_tmfifo_vdev_tx_buf_pop().
    - Add union to avoid casting between __le64 and u64.
    - Several other misc coding style comments.
    New changes by Liming:
    - Removed the DT binding documentation since only ACPI is
      supported for now by UEFI on the SoC.
v8: Re-submit under drivers/platform/mellanox for the target-side
    platform driver only.
v7: Added host side drivers into the same patch set.
v5~v6: Coding style fix.
v1~v4: Initial version for directory drivers/soc/mellanox.
---
 drivers/platform/mellanox/Kconfig             |   12 +-
 drivers/platform/mellanox/Makefile            |    1 +
 drivers/platform/mellanox/mlxbf-tmfifo-regs.h |   63 ++
 drivers/platform/mellanox/mlxbf-tmfifo.c      | 1291 +++++++++++++++++++++++++
 4 files changed, 1366 insertions(+), 1 deletion(-)
 create mode 100644 drivers/platform/mellanox/mlxbf-tmfifo-regs.h
 create mode 100644 drivers/platform/mellanox/mlxbf-tmfifo.c

diff --git a/drivers/platform/mellanox/Kconfig b/drivers/platform/mellanox/Kconfig
index cd8a908..530fe7e 100644
--- a/drivers/platform/mellanox/Kconfig
+++ b/drivers/platform/mellanox/Kconfig
@@ -5,7 +5,7 @@
 
 menuconfig MELLANOX_PLATFORM
 	bool "Platform support for Mellanox hardware"
-	depends on X86 || ARM || COMPILE_TEST
+	depends on X86 || ARM || ARM64 || COMPILE_TEST
 	---help---
 	  Say Y here to get to see options for platform support for
 	  Mellanox systems. This option alone does not add any kernel code.
@@ -34,4 +34,14 @@ config MLXREG_IO
 	  to system resets operation, system reset causes monitoring and some
 	  kinds of mux selection.
 
+config MLXBF_TMFIFO
+	tristate "Mellanox BlueField SoC TmFifo platform driver"
+	depends on ARM64
+	depends on ACPI
+	depends on VIRTIO_CONSOLE && VIRTIO_NET
+	help
+	  Say y here to enable TmFifo support. The TmFifo driver provides
+          platform driver support for the TmFifo which supports console
+          and networking based on the virtio framework.
+
 endif # MELLANOX_PLATFORM
diff --git a/drivers/platform/mellanox/Makefile b/drivers/platform/mellanox/Makefile
index 57074d9c..a229bda1 100644
--- a/drivers/platform/mellanox/Makefile
+++ b/drivers/platform/mellanox/Makefile
@@ -3,5 +3,6 @@
 # Makefile for linux/drivers/platform/mellanox
 # Mellanox Platform-Specific Drivers
 #
+obj-$(CONFIG_MLXBF_TMFIFO)	+= mlxbf-tmfifo.o
 obj-$(CONFIG_MLXREG_HOTPLUG)	+= mlxreg-hotplug.o
 obj-$(CONFIG_MLXREG_IO) += mlxreg-io.o
diff --git a/drivers/platform/mellanox/mlxbf-tmfifo-regs.h b/drivers/platform/mellanox/mlxbf-tmfifo-regs.h
new file mode 100644
index 0000000..e4f0d2e
--- /dev/null
+++ b/drivers/platform/mellanox/mlxbf-tmfifo-regs.h
@@ -0,0 +1,63 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Copyright (c) 2019, Mellanox Technologies. All rights reserved.
+ */
+
+#ifndef __MLXBF_TMFIFO_REGS_H__
+#define __MLXBF_TMFIFO_REGS_H__
+
+#include <linux/types.h>
+#include <linux/bits.h>
+
+#define MLXBF_TMFIFO_TX_DATA				0x00
+#define MLXBF_TMFIFO_TX_STS				0x08
+#define MLXBF_TMFIFO_TX_STS__LENGTH			0x0001
+#define MLXBF_TMFIFO_TX_STS__COUNT_SHIFT		0
+#define MLXBF_TMFIFO_TX_STS__COUNT_WIDTH		9
+#define MLXBF_TMFIFO_TX_STS__COUNT_RESET_VAL		0
+#define MLXBF_TMFIFO_TX_STS__COUNT_RMASK		GENMASK_ULL(8, 0)
+#define MLXBF_TMFIFO_TX_STS__COUNT_MASK			GENMASK_ULL(8, 0)
+#define MLXBF_TMFIFO_TX_CTL				0x10
+#define MLXBF_TMFIFO_TX_CTL__LENGTH			0x0001
+#define MLXBF_TMFIFO_TX_CTL__LWM_SHIFT			0
+#define MLXBF_TMFIFO_TX_CTL__LWM_WIDTH			8
+#define MLXBF_TMFIFO_TX_CTL__LWM_RESET_VAL		128
+#define MLXBF_TMFIFO_TX_CTL__LWM_RMASK			GENMASK_ULL(7, 0)
+#define MLXBF_TMFIFO_TX_CTL__LWM_MASK			GENMASK_ULL(7, 0)
+#define MLXBF_TMFIFO_TX_CTL__HWM_SHIFT			8
+#define MLXBF_TMFIFO_TX_CTL__HWM_WIDTH			8
+#define MLXBF_TMFIFO_TX_CTL__HWM_RESET_VAL		128
+#define MLXBF_TMFIFO_TX_CTL__HWM_RMASK			GENMASK_ULL(7, 0)
+#define MLXBF_TMFIFO_TX_CTL__HWM_MASK			GENMASK_ULL(15, 8)
+#define MLXBF_TMFIFO_TX_CTL__MAX_ENTRIES_SHIFT		32
+#define MLXBF_TMFIFO_TX_CTL__MAX_ENTRIES_WIDTH		9
+#define MLXBF_TMFIFO_TX_CTL__MAX_ENTRIES_RESET_VAL	256
+#define MLXBF_TMFIFO_TX_CTL__MAX_ENTRIES_RMASK		GENMASK_ULL(8, 0)
+#define MLXBF_TMFIFO_TX_CTL__MAX_ENTRIES_MASK		GENMASK_ULL(40, 32)
+#define MLXBF_TMFIFO_RX_DATA				0x00
+#define MLXBF_TMFIFO_RX_STS				0x08
+#define MLXBF_TMFIFO_RX_STS__LENGTH			0x0001
+#define MLXBF_TMFIFO_RX_STS__COUNT_SHIFT		0
+#define MLXBF_TMFIFO_RX_STS__COUNT_WIDTH		9
+#define MLXBF_TMFIFO_RX_STS__COUNT_RESET_VAL		0
+#define MLXBF_TMFIFO_RX_STS__COUNT_RMASK		GENMASK_ULL(8, 0)
+#define MLXBF_TMFIFO_RX_STS__COUNT_MASK			GENMASK_ULL(8, 0)
+#define MLXBF_TMFIFO_RX_CTL				0x10
+#define MLXBF_TMFIFO_RX_CTL__LENGTH			0x0001
+#define MLXBF_TMFIFO_RX_CTL__LWM_SHIFT			0
+#define MLXBF_TMFIFO_RX_CTL__LWM_WIDTH			8
+#define MLXBF_TMFIFO_RX_CTL__LWM_RESET_VAL		128
+#define MLXBF_TMFIFO_RX_CTL__LWM_RMASK			GENMASK_ULL(7, 0)
+#define MLXBF_TMFIFO_RX_CTL__LWM_MASK			GENMASK_ULL(7, 0)
+#define MLXBF_TMFIFO_RX_CTL__HWM_SHIFT			8
+#define MLXBF_TMFIFO_RX_CTL__HWM_WIDTH			8
+#define MLXBF_TMFIFO_RX_CTL__HWM_RESET_VAL		128
+#define MLXBF_TMFIFO_RX_CTL__HWM_RMASK			GENMASK_ULL(7, 0)
+#define MLXBF_TMFIFO_RX_CTL__HWM_MASK			GENMASK_ULL(15, 8)
+#define MLXBF_TMFIFO_RX_CTL__MAX_ENTRIES_SHIFT		32
+#define MLXBF_TMFIFO_RX_CTL__MAX_ENTRIES_WIDTH		9
+#define MLXBF_TMFIFO_RX_CTL__MAX_ENTRIES_RESET_VAL	256
+#define MLXBF_TMFIFO_RX_CTL__MAX_ENTRIES_RMASK		GENMASK_ULL(8, 0)
+#define MLXBF_TMFIFO_RX_CTL__MAX_ENTRIES_MASK		GENMASK_ULL(40, 32)
+
+#endif /* !defined(__MLXBF_TMFIFO_REGS_H__) */
diff --git a/drivers/platform/mellanox/mlxbf-tmfifo.c b/drivers/platform/mellanox/mlxbf-tmfifo.c
new file mode 100644
index 0000000..2bc03c3
--- /dev/null
+++ b/drivers/platform/mellanox/mlxbf-tmfifo.c
@@ -0,0 +1,1291 @@
+// SPDX-License-Identifier: GPL-2.0+
+/*
+ * Mellanox BlueField SoC TmFifo driver
+ *
+ * Copyright (C) 2019 Mellanox Technologies
+ */
+
+#include <linux/acpi.h>
+#include <linux/bitfield.h>
+#include <linux/circ_buf.h>
+#include <linux/efi.h>
+#include <linux/irq.h>
+#include <linux/module.h>
+#include <linux/mutex.h>
+#include <linux/platform_device.h>
+#include <linux/types.h>
+#include <linux/virtio_config.h>
+#include <linux/virtio_console.h>
+#include <linux/virtio_ids.h>
+#include <linux/virtio_net.h>
+#include <linux/virtio_ring.h>
+
+#include "mlxbf-tmfifo-regs.h"
+
+/* Vring size. */
+#define MLXBF_TMFIFO_VRING_SIZE			SZ_1K
+
+/* Console Tx buffer size. */
+#define MLXBF_TMFIFO_CON_TX_BUF_SIZE		SZ_32K
+
+/* Console Tx buffer reserved space. */
+#define MLXBF_TMFIFO_CON_TX_BUF_RSV_SIZE	8
+
+/* House-keeping timer interval. */
+#define MLXBF_TMFIFO_TIMER_INTERVAL		(HZ / 10)
+
+/* Virtual devices sharing the TM FIFO. */
+#define MLXBF_TMFIFO_VDEV_MAX		(VIRTIO_ID_CONSOLE + 1)
+
+/*
+ * Reserve 1/16 of TmFifo space, so console messages are not starved by
+ * the networking traffic.
+ */
+#define MLXBF_TMFIFO_RESERVE_RATIO		16
+
+/* Message with data needs at least two words (for header & data). */
+#define MLXBF_TMFIFO_DATA_MIN_WORDS		2
+
+struct mlxbf_tmfifo;
+
+/**
+ * mlxbf_tmfifo_vring - Structure of the TmFifo virtual ring
+ * @va: virtual address of the ring
+ * @dma: dma address of the ring
+ * @vq: pointer to the virtio virtqueue
+ * @desc: current descriptor of the pending packet
+ * @desc_head: head descriptor of the pending packet
+ * @cur_len: processed length of the current descriptor
+ * @rem_len: remaining length of the pending packet
+ * @pkt_len: total length of the pending packet
+ * @next_avail: next avail descriptor id
+ * @num: vring size (number of descriptors)
+ * @align: vring alignment size
+ * @index: vring index
+ * @vdev_id: vring virtio id (VIRTIO_ID_xxx)
+ * @fifo: pointer to the tmfifo structure
+ */
+struct mlxbf_tmfifo_vring {
+	void *va;
+	dma_addr_t dma;
+	struct virtqueue *vq;
+	struct vring_desc *desc;
+	struct vring_desc *desc_head;
+	int cur_len;
+	int rem_len;
+	u32 pkt_len;
+	u16 next_avail;
+	int num;
+	int align;
+	int index;
+	int vdev_id;
+	struct mlxbf_tmfifo *fifo;
+};
+
+/* Interrupt types. */
+enum {
+	MLXBF_TM_RX_LWM_IRQ,
+	MLXBF_TM_RX_HWM_IRQ,
+	MLXBF_TM_TX_LWM_IRQ,
+	MLXBF_TM_TX_HWM_IRQ,
+	MLXBF_TM_MAX_IRQ
+};
+
+/* Ring types (Rx & Tx). */
+enum {
+	MLXBF_TMFIFO_VRING_RX,
+	MLXBF_TMFIFO_VRING_TX,
+	MLXBF_TMFIFO_VRING_MAX
+};
+
+/**
+ * mlxbf_tmfifo_vdev - Structure of the TmFifo virtual device
+ * @vdev: virtio device, in which the vdev.id.device field has the
+ *        VIRTIO_ID_xxx id to distinguish the virtual device.
+ * @status: status of the device
+ * @features: supported features of the device
+ * @vrings: array of tmfifo vrings of this device
+ * @config.cons: virtual console config -
+ *               select if vdev.id.device is VIRTIO_ID_CONSOLE
+ * @config.net: virtual network config -
+ *              select if vdev.id.device is VIRTIO_ID_NET
+ * @tx_buf: tx buffer used to buffer data before writing into the FIFO
+ */
+struct mlxbf_tmfifo_vdev {
+	struct virtio_device vdev;
+	u8 status;
+	u64 features;
+	struct mlxbf_tmfifo_vring vrings[MLXBF_TMFIFO_VRING_MAX];
+	union {
+		struct virtio_console_config cons;
+		struct virtio_net_config net;
+	} config;
+	struct circ_buf tx_buf;
+};
+
+/**
+ * mlxbf_tmfifo_irq_info - Structure of the interrupt information
+ * @fifo: pointer to the tmfifo structure
+ * @irq: interrupt number
+ * @index: index into the interrupt array
+ */
+struct mlxbf_tmfifo_irq_info {
+	struct mlxbf_tmfifo *fifo;
+	int irq;
+	int index;
+};
+
+/**
+ * mlxbf_tmfifo - Structure of the TmFifo
+ * @vdev: array of the virtual devices running over the TmFifo
+ * @pdev: platform device
+ * @lock: lock to protect the TmFifo access
+ * @rx_base: mapped register base address for the Rx fifo
+ * @tx_base: mapped register base address for the Tx fifo
+ * @rx_fifo_size: number of entries of the Rx fifo
+ * @tx_fifo_size: number of entries of the Tx fifo
+ * @pend_events: pending bits for deferred events
+ * @irq_info: interrupt information
+ * @work: work struct for deferred process
+ * @timer: background timer
+ * @vring: Tx/Rx ring
+ * @spin_lock: spin lock
+ * @is_ready: ready flag
+ */
+struct mlxbf_tmfifo {
+	struct mlxbf_tmfifo_vdev *vdev[MLXBF_TMFIFO_VDEV_MAX];
+	struct platform_device *pdev;
+	struct mutex lock;		/* TmFifo lock */
+	void __iomem *rx_base;
+	void __iomem *tx_base;
+	int rx_fifo_size;
+	int tx_fifo_size;
+	unsigned long pend_events;
+	struct mlxbf_tmfifo_irq_info irq_info[MLXBF_TM_MAX_IRQ];
+	struct work_struct work;
+	struct timer_list timer;
+	struct mlxbf_tmfifo_vring *vring[2];
+	spinlock_t spin_lock;		/* spin lock */
+	bool is_ready;
+};
+
+/**
+ * mlxbf_tmfifo_msg_hdr - Structure of the TmFifo message header
+ * @type: message type
+ * @len: payload length
+ * @u: 64-bit union data
+ */
+union mlxbf_tmfifo_msg_hdr {
+	struct {
+		u8 type;
+		__be16 len;
+		u8 unused[5];
+	} __packed;
+	u64 data;
+};
+
+/*
+ * Default MAC.
+ * This MAC address will be read from EFI persistent variable if configured.
+ * It can also be reconfigured with standard Linux tools.
+ */
+static u8 mlxbf_tmfifo_net_default_mac[ETH_ALEN] = {
+	0x00, 0x1A, 0xCA, 0xFF, 0xFF, 0x01};
+
+/* EFI variable name of the MAC address. */
+static efi_char16_t mlxbf_tmfifo_efi_name[] = L"RshimMacAddr";
+
+/* Maximum L2 header length. */
+#define MLXBF_TMFIFO_NET_L2_OVERHEAD	36
+
+/* Supported virtio-net features. */
+#define MLXBF_TMFIFO_NET_FEATURES	(BIT_ULL(VIRTIO_NET_F_MTU) | \
+					 BIT_ULL(VIRTIO_NET_F_STATUS) | \
+					 BIT_ULL(VIRTIO_NET_F_MAC))
+
+#define mlxbf_vdev_to_tmfifo(d) container_of(d, struct mlxbf_tmfifo_vdev, vdev)
+
+/* Allocate vrings for the fifo. */
+static int mlxbf_tmfifo_alloc_vrings(struct mlxbf_tmfifo *fifo,
+				     struct mlxbf_tmfifo_vdev *tm_vdev)
+{
+	struct mlxbf_tmfifo_vring *vring;
+	struct device *dev;
+	dma_addr_t dma;
+	int i, size;
+	void *va;
+
+	for (i = 0; i < ARRAY_SIZE(tm_vdev->vrings); i++) {
+		vring = &tm_vdev->vrings[i];
+		vring->fifo = fifo;
+		vring->num = MLXBF_TMFIFO_VRING_SIZE;
+		vring->align = SMP_CACHE_BYTES;
+		vring->index = i;
+		vring->vdev_id = tm_vdev->vdev.id.device;
+		dev = &tm_vdev->vdev.dev;
+
+		size = vring_size(vring->num, vring->align);
+		va = dma_alloc_coherent(dev->parent, size, &dma, GFP_KERNEL);
+		if (!va) {
+			dev_err(dev->parent, "dma_alloc_coherent failed\n");
+			return -ENOMEM;
+		}
+
+		vring->va = va;
+		vring->dma = dma;
+	}
+
+	return 0;
+}
+
+/* Free vrings of the fifo device. */
+static void mlxbf_tmfifo_free_vrings(struct mlxbf_tmfifo *fifo,
+				     struct mlxbf_tmfifo_vdev *tm_vdev)
+{
+	struct mlxbf_tmfifo_vring *vring;
+	int i, size;
+
+	for (i = 0; i < ARRAY_SIZE(tm_vdev->vrings); i++) {
+		vring = &tm_vdev->vrings[i];
+		if (vring->va) {
+			size = vring_size(vring->num, vring->align);
+			dma_free_coherent(tm_vdev->vdev.dev.parent, size,
+					  vring->va, vring->dma);
+			vring->va = NULL;
+			if (vring->vq) {
+				vring_del_virtqueue(vring->vq);
+				vring->vq = NULL;
+			}
+		}
+	}
+}
+
+/* Disable interrupts of the fifo device. */
+static void mlxbf_tmfifo_disable_irqs(struct mlxbf_tmfifo *fifo)
+{
+	int i, irq;
+
+	for (i = 0; i < MLXBF_TM_MAX_IRQ; i++) {
+		irq = fifo->irq_info[i].irq;
+		fifo->irq_info[i].irq = 0;
+		disable_irq(irq);
+	}
+}
+
+/* Interrupt handler. */
+static irqreturn_t mlxbf_tmfifo_irq_handler(int irq, void *arg)
+{
+	struct mlxbf_tmfifo_irq_info *irq_info = arg;
+
+	if (irq_info->index < MLXBF_TM_MAX_IRQ &&
+	    !test_and_set_bit(irq_info->index, &irq_info->fifo->pend_events))
+		schedule_work(&irq_info->fifo->work);
+
+	return IRQ_HANDLED;
+}
+
+/* Get the next packet descriptor from the vring. */
+static struct vring_desc *
+mlxbf_tmfifo_get_next_desc(struct mlxbf_tmfifo_vring *vring)
+{
+	const struct vring *vr = virtqueue_get_vring(vring->vq);
+	struct virtio_device *vdev = vring->vq->vdev;
+	unsigned int idx, head;
+
+	if (vring->next_avail == virtio16_to_cpu(vdev, vr->avail->idx))
+		return NULL;
+
+	idx = vring->next_avail % vr->num;
+	head = virtio16_to_cpu(vdev, vr->avail->ring[idx]);
+	if (WARN_ON(head >= vr->num))
+		return NULL;
+
+	vring->next_avail++;
+
+	return &vr->desc[head];
+}
+
+/* Release virtio descriptor. */
+static void mlxbf_tmfifo_release_desc(struct mlxbf_tmfifo_vring *vring,
+				      struct vring_desc *desc, u32 len)
+{
+	const struct vring *vr = virtqueue_get_vring(vring->vq);
+	struct virtio_device *vdev = vring->vq->vdev;
+	u16 idx, vr_idx;
+
+	vr_idx = virtio16_to_cpu(vdev, vr->used->idx);
+	idx = vr_idx % vr->num;
+	vr->used->ring[idx].id = cpu_to_virtio32(vdev, desc - vr->desc);
+	vr->used->ring[idx].len = cpu_to_virtio32(vdev, len);
+
+	/*
+	 * Virtio could poll and check the 'idx' to decide whether the desc is
+	 * done or not. Add a memory barrier here to make sure the update above
+	 * completes before updating the idx.
+	 */
+	mb();
+	vr->used->idx = cpu_to_virtio16(vdev, vr_idx + 1);
+}
+
+/* Get the total length of the descriptor chain. */
+static u32 mlxbf_tmfifo_get_pkt_len(struct mlxbf_tmfifo_vring *vring,
+				    struct vring_desc *desc)
+{
+	const struct vring *vr = virtqueue_get_vring(vring->vq);
+	struct virtio_device *vdev = vring->vq->vdev;
+	u32 len = 0, idx;
+
+	while (desc) {
+		len += virtio32_to_cpu(vdev, desc->len);
+		if (!(virtio16_to_cpu(vdev, desc->flags) & VRING_DESC_F_NEXT))
+			break;
+		idx = virtio16_to_cpu(vdev, desc->next);
+		desc = &vr->desc[idx];
+	}
+
+	return len;
+}
+
+static void mlxbf_tmfifo_release_pending_pkt(struct mlxbf_tmfifo_vring *vring)
+{
+	struct vring_desc *desc_head;
+	u32 len = 0;
+
+	if (vring->desc_head) {
+		desc_head = vring->desc_head;
+		len = vring->pkt_len;
+	} else {
+		desc_head = mlxbf_tmfifo_get_next_desc(vring);
+		if (desc_head)
+			len = mlxbf_tmfifo_get_pkt_len(vring, desc_head);
+	}
+
+	if (desc_head)
+		mlxbf_tmfifo_release_desc(vring, desc_head, len);
+
+	vring->pkt_len = 0;
+	vring->desc = NULL;
+	vring->desc_head = NULL;
+}
+
+static void mlxbf_tmfifo_init_net_desc(struct mlxbf_tmfifo_vring *vring,
+				       struct vring_desc *desc, bool is_rx)
+{
+	struct virtio_device *vdev = vring->vq->vdev;
+	struct virtio_net_hdr *net_hdr;
+
+	net_hdr = phys_to_virt(virtio64_to_cpu(vdev, desc->addr));
+	memset(net_hdr, 0, sizeof(*net_hdr));
+}
+
+/* Get and initialize the next packet. */
+static struct vring_desc *
+mlxbf_tmfifo_get_next_pkt(struct mlxbf_tmfifo_vring *vring, bool is_rx)
+{
+	struct vring_desc *desc;
+
+	desc = mlxbf_tmfifo_get_next_desc(vring);
+	if (desc && is_rx && vring->vdev_id == VIRTIO_ID_NET)
+		mlxbf_tmfifo_init_net_desc(vring, desc, is_rx);
+
+	vring->desc_head = desc;
+	vring->desc = desc;
+
+	return desc;
+}
+
+/* House-keeping timer. */
+static void mlxbf_tmfifo_timer(struct timer_list *arg)
+{
+	struct mlxbf_tmfifo *fifo = container_of(arg, struct mlxbf_tmfifo,
+						 timer);
+	int more;
+
+	more = !test_and_set_bit(MLXBF_TM_RX_HWM_IRQ, &fifo->pend_events) ||
+		    !test_and_set_bit(MLXBF_TM_TX_LWM_IRQ, &fifo->pend_events);
+
+	if (more)
+		schedule_work(&fifo->work);
+
+	mod_timer(&fifo->timer, jiffies + MLXBF_TMFIFO_TIMER_INTERVAL);
+}
+
+/* Copy one console packet into the output buffer. */
+static void mlxbf_tmfifo_console_output_one(struct mlxbf_tmfifo_vdev *cons,
+					    struct mlxbf_tmfifo_vring *vring,
+					    struct vring_desc *desc)
+{
+	const struct vring *vr = virtqueue_get_vring(vring->vq);
+	struct virtio_device *vdev = &cons->vdev;
+	u32 len, idx, seg;
+	void *addr;
+
+	while (desc) {
+		addr = phys_to_virt(virtio64_to_cpu(vdev, desc->addr));
+		len = virtio32_to_cpu(vdev, desc->len);
+
+		seg = CIRC_SPACE_TO_END(cons->tx_buf.head, cons->tx_buf.tail,
+					MLXBF_TMFIFO_CON_TX_BUF_SIZE);
+		if (len <= seg) {
+			memcpy(cons->tx_buf.buf + cons->tx_buf.head, addr, len);
+		} else {
+			memcpy(cons->tx_buf.buf + cons->tx_buf.head, addr, seg);
+			addr += seg;
+			memcpy(cons->tx_buf.buf, addr, len - seg);
+		}
+		cons->tx_buf.head = (cons->tx_buf.head + len) %
+			MLXBF_TMFIFO_CON_TX_BUF_SIZE;
+
+		if (!(virtio16_to_cpu(vdev, desc->flags) & VRING_DESC_F_NEXT))
+			break;
+		idx = virtio16_to_cpu(vdev, desc->next);
+		desc = &vr->desc[idx];
+	}
+}
+
+/* Copy console data into the output buffer. */
+static void mlxbf_tmfifo_console_output(struct mlxbf_tmfifo_vdev *cons,
+					struct mlxbf_tmfifo_vring *vring)
+{
+	struct vring_desc *desc;
+	u32 len, avail;
+
+	desc = mlxbf_tmfifo_get_next_desc(vring);
+	while (desc) {
+		/* Release the packet if not enough space. */
+		len = mlxbf_tmfifo_get_pkt_len(vring, desc);
+		avail = CIRC_SPACE(cons->tx_buf.head, cons->tx_buf.tail,
+				   MLXBF_TMFIFO_CON_TX_BUF_SIZE);
+		if (len + MLXBF_TMFIFO_CON_TX_BUF_RSV_SIZE > avail) {
+			mlxbf_tmfifo_release_desc(vring, desc, len);
+			break;
+		}
+
+		mlxbf_tmfifo_console_output_one(cons, vring, desc);
+		mlxbf_tmfifo_release_desc(vring, desc, len);
+		desc = mlxbf_tmfifo_get_next_desc(vring);
+	}
+}
+
+/* Get the number of available words in Rx FIFO for receiving. */
+static int mlxbf_tmfifo_get_rx_avail(struct mlxbf_tmfifo *fifo)
+{
+	u64 sts;
+
+	sts = readq(fifo->rx_base + MLXBF_TMFIFO_RX_STS);
+	return FIELD_GET(MLXBF_TMFIFO_RX_STS__COUNT_MASK, sts);
+}
+
+/* Get the number of available words in the TmFifo for sending. */
+static int mlxbf_tmfifo_get_tx_avail(struct mlxbf_tmfifo *fifo, int vdev_id)
+{
+	int tx_reserve;
+	u32 count;
+	u64 sts;
+
+	/* Reserve some room in FIFO for console messages. */
+	if (vdev_id == VIRTIO_ID_NET)
+		tx_reserve = fifo->tx_fifo_size / MLXBF_TMFIFO_RESERVE_RATIO;
+	else
+		tx_reserve = 1;
+
+	sts = readq(fifo->tx_base + MLXBF_TMFIFO_TX_STS);
+	count = FIELD_GET(MLXBF_TMFIFO_TX_STS__COUNT_MASK, sts);
+	return fifo->tx_fifo_size - tx_reserve - count;
+}
+
+/* Console Tx (move data from the output buffer into the TmFifo). */
+static void mlxbf_tmfifo_console_tx(struct mlxbf_tmfifo *fifo, int avail)
+{
+	union mlxbf_tmfifo_msg_hdr hdr;
+	struct mlxbf_tmfifo_vdev *cons;
+	unsigned long flags;
+	int size, seg;
+	void *addr;
+	u64 data;
+
+	/* Return if not enough space available. */
+	if (avail < MLXBF_TMFIFO_DATA_MIN_WORDS)
+		return;
+
+	cons = fifo->vdev[VIRTIO_ID_CONSOLE];
+	if (!cons || !cons->tx_buf.buf)
+		return;
+
+	/* Return if no data to send. */
+	size = CIRC_CNT(cons->tx_buf.head, cons->tx_buf.tail,
+			MLXBF_TMFIFO_CON_TX_BUF_SIZE);
+	if (size == 0)
+		return;
+
+	/* Adjust the size to available space. */
+	if (size + sizeof(hdr) > avail * sizeof(u64))
+		size = avail * sizeof(u64) - sizeof(hdr);
+
+	/* Write header. */
+	hdr.data = 0;
+	hdr.type = VIRTIO_ID_CONSOLE;
+	hdr.len = htons(size);
+	writeq(hdr.data, fifo->tx_base + MLXBF_TMFIFO_TX_DATA);
+
+	/* Use spin-lock to protect the 'cons->tx_buf'. */
+	spin_lock_irqsave(&fifo->spin_lock, flags);
+
+	while (size > 0) {
+		addr = cons->tx_buf.buf + cons->tx_buf.tail;
+
+		seg = CIRC_CNT_TO_END(cons->tx_buf.head, cons->tx_buf.tail,
+				      MLXBF_TMFIFO_CON_TX_BUF_SIZE);
+		if (seg >= sizeof(u64)) {
+			memcpy(&data, addr, sizeof(u64));
+		} else {
+			memcpy(&data, addr, seg);
+			memcpy((u8 *)&data + seg, cons->tx_buf.buf,
+			       sizeof(u64) - seg);
+		}
+		writeq(data, fifo->tx_base + MLXBF_TMFIFO_TX_DATA);
+
+		if (size >= sizeof(u64)) {
+			cons->tx_buf.tail = (cons->tx_buf.tail + sizeof(u64)) %
+				MLXBF_TMFIFO_CON_TX_BUF_SIZE;
+			size -= sizeof(u64);
+		} else {
+			cons->tx_buf.tail = (cons->tx_buf.tail + size) %
+				MLXBF_TMFIFO_CON_TX_BUF_SIZE;
+			size = 0;
+		}
+	}
+
+	spin_unlock_irqrestore(&fifo->spin_lock, flags);
+}
+
+/* Rx/Tx one word in the descriptor buffer. */
+static void mlxbf_tmfifo_rxtx_word(struct mlxbf_tmfifo_vring *vring,
+				   struct vring_desc *desc,
+				   bool is_rx, int len)
+{
+	struct virtio_device *vdev = vring->vq->vdev;
+	struct mlxbf_tmfifo *fifo = vring->fifo;
+	void *addr;
+	u64 data;
+
+	/* Get the buffer address of this desc. */
+	addr = phys_to_virt(virtio64_to_cpu(vdev, desc->addr));
+
+	/* Read a word from FIFO for Rx. */
+	if (is_rx)
+		data = readq(fifo->rx_base + MLXBF_TMFIFO_RX_DATA);
+
+	if (vring->cur_len + sizeof(u64) <= len) {
+		/* The whole word. */
+		if (is_rx)
+			memcpy(addr + vring->cur_len, &data, sizeof(u64));
+		else
+			memcpy(&data, addr + vring->cur_len, sizeof(u64));
+		vring->cur_len += sizeof(u64);
+	} else {
+		/* Leftover bytes. */
+		if (is_rx)
+			memcpy(addr + vring->cur_len, &data,
+			       len - vring->cur_len);
+		else
+			memcpy(&data, addr + vring->cur_len,
+			       len - vring->cur_len);
+		vring->cur_len = len;
+	}
+
+	/* Write the word into FIFO for Tx. */
+	if (!is_rx)
+		writeq(data, fifo->tx_base + MLXBF_TMFIFO_TX_DATA);
+}
+
+/*
+ * Rx/Tx packet header.
+ *
+ * In Rx case, the packet might be found to belong to a different vring since
+ * the TmFifo is shared by different services. In such case, the 'vring_change'
+ * flag is set.
+ */
+static void mlxbf_tmfifo_rxtx_header(struct mlxbf_tmfifo_vring *vring,
+				     struct vring_desc *desc,
+				     bool is_rx, bool *vring_change)
+{
+	struct mlxbf_tmfifo *fifo = vring->fifo;
+	struct virtio_net_config *config;
+	union mlxbf_tmfifo_msg_hdr hdr;
+	int vdev_id, hdr_len;
+
+	/* Read/Write packet header. */
+	if (is_rx) {
+		/* Drain one word from the FIFO. */
+		hdr.data = readq(fifo->rx_base + MLXBF_TMFIFO_RX_DATA);
+
+		/* Skip the length 0 packets (keepalive). */
+		if (hdr.len == 0)
+			return;
+
+		/* Check packet type. */
+		if (hdr.type == VIRTIO_ID_NET) {
+			vdev_id = VIRTIO_ID_NET;
+			hdr_len = sizeof(struct virtio_net_hdr);
+			config = &fifo->vdev[vdev_id]->config.net;
+			if (ntohs(hdr.len) > config->mtu +
+			    MLXBF_TMFIFO_NET_L2_OVERHEAD)
+				return;
+		} else {
+			vdev_id = VIRTIO_ID_CONSOLE;
+			hdr_len = 0;
+		}
+
+		/*
+		 * Check whether the new packet still belongs to this vring.
+		 * If not, update the pkt_len of the new vring.
+		 */
+		if (vdev_id != vring->vdev_id) {
+			struct mlxbf_tmfifo_vdev *tm_dev2 = fifo->vdev[vdev_id];
+
+			if (!tm_dev2)
+				return;
+			vring->desc = desc;
+			vring = &tm_dev2->vrings[MLXBF_TMFIFO_VRING_RX];
+			*vring_change = true;
+		}
+		vring->pkt_len = ntohs(hdr.len) + hdr_len;
+	} else {
+		/* Network virtio has an extra header. */
+		hdr_len = (vring->vdev_id == VIRTIO_ID_NET) ?
+			   sizeof(struct virtio_net_hdr) : 0;
+		vring->pkt_len = mlxbf_tmfifo_get_pkt_len(vring, desc);
+		hdr.data = 0;
+		hdr.type = (vring->vdev_id == VIRTIO_ID_NET) ?
+			    VIRTIO_ID_NET : VIRTIO_ID_CONSOLE;
+		hdr.len = htons(vring->pkt_len - hdr_len);
+		writeq(hdr.data, fifo->tx_base + MLXBF_TMFIFO_TX_DATA);
+	}
+
+	vring->cur_len = hdr_len;
+	vring->rem_len = vring->pkt_len;
+	fifo->vring[is_rx] = vring;
+}
+
+/*
+ * Rx/Tx one descriptor.
+ *
+ * Return true to indicate more data available.
+ */
+static bool mlxbf_tmfifo_rxtx_one_desc(struct mlxbf_tmfifo_vring *vring,
+				       bool is_rx, int *avail)
+{
+	const struct vring *vr = virtqueue_get_vring(vring->vq);
+	struct mlxbf_tmfifo *fifo = vring->fifo;
+	struct virtio_device *vdev;
+	bool vring_change = false;
+	struct vring_desc *desc;
+	unsigned long flags;
+	u32 len, idx;
+
+	vdev = &fifo->vdev[vring->vdev_id]->vdev;
+
+	/* Get the descriptor of the next packet. */
+	if (!vring->desc) {
+		desc = mlxbf_tmfifo_get_next_pkt(vring, is_rx);
+		if (!desc)
+			return false;
+	} else {
+		desc = vring->desc;
+	}
+
+	/* Beginning of a packet. Start to Rx/Tx packet header. */
+	if (vring->pkt_len == 0) {
+		mlxbf_tmfifo_rxtx_header(vring, desc, is_rx, &vring_change);
+		(*avail)--;
+
+		/* Return if new packet is for another ring. */
+		if (vring_change)
+			return false;
+		goto mlxbf_tmfifo_desc_done;
+	}
+
+	/* Get the length of this desc. */
+	len = virtio32_to_cpu(vdev, desc->len);
+	if (len > vring->rem_len)
+		len = vring->rem_len;
+
+	/* Rx/Tx one word (8 bytes) if not done. */
+	if (vring->cur_len < len) {
+		mlxbf_tmfifo_rxtx_word(vring, desc, is_rx, len);
+		(*avail)--;
+	}
+
+	/* Check again whether it's done. */
+	if (vring->cur_len == len) {
+		vring->cur_len = 0;
+		vring->rem_len -= len;
+
+		/* Get the next desc on the chain. */
+		if (vring->rem_len > 0 &&
+		    (virtio16_to_cpu(vdev, desc->flags) & VRING_DESC_F_NEXT)) {
+			idx = virtio16_to_cpu(vdev, desc->next);
+			desc = &vr->desc[idx];
+			goto mlxbf_tmfifo_desc_done;
+		}
+
+		/* Done and release the pending packet. */
+		mlxbf_tmfifo_release_pending_pkt(vring);
+		desc = NULL;
+		fifo->vring[is_rx] = NULL;
+
+		/* Notify upper layer that packet is done. */
+		spin_lock_irqsave(&fifo->spin_lock, flags);
+		vring_interrupt(0, vring->vq);
+		spin_unlock_irqrestore(&fifo->spin_lock, flags);
+	}
+
+mlxbf_tmfifo_desc_done:
+	/* Save the current desc. */
+	vring->desc = desc;
+
+	return true;
+}
+
+/* Rx & Tx processing of a queue. */
+static void mlxbf_tmfifo_rxtx(struct mlxbf_tmfifo_vring *vring, bool is_rx)
+{
+	int avail = 0, devid = vring->vdev_id;
+	struct mlxbf_tmfifo *fifo;
+	bool more;
+
+	fifo = vring->fifo;
+
+	/* Return if vdev is not ready. */
+	if (!fifo->vdev[devid])
+		return;
+
+	/* Return if another vring is running. */
+	if (fifo->vring[is_rx] && fifo->vring[is_rx] != vring)
+		return;
+
+	/* Only handle console and network for now. */
+	if (WARN_ON(devid != VIRTIO_ID_NET && devid != VIRTIO_ID_CONSOLE))
+		return;
+
+	do {
+		/* Get available FIFO space. */
+		if (avail == 0) {
+			if (is_rx)
+				avail = mlxbf_tmfifo_get_rx_avail(fifo);
+			else
+				avail = mlxbf_tmfifo_get_tx_avail(fifo, devid);
+			if (avail <= 0)
+				break;
+		}
+
+		/* Console output always comes from the Tx buffer. */
+		if (!is_rx && devid == VIRTIO_ID_CONSOLE) {
+			mlxbf_tmfifo_console_tx(fifo, avail);
+			break;
+		}
+
+		/* Handle one descriptor. */
+		more = mlxbf_tmfifo_rxtx_one_desc(vring, is_rx, &avail);
+	} while (more);
+}
+
+/* Handle Rx or Tx queues. */
+static void mlxbf_tmfifo_work_rxtx(struct mlxbf_tmfifo *fifo, int queue_id,
+				   int irq_id, bool is_rx)
+{
+	struct mlxbf_tmfifo_vdev *tm_vdev;
+	struct mlxbf_tmfifo_vring *vring;
+	int i;
+
+	if (!test_and_clear_bit(irq_id, &fifo->pend_events) ||
+	    !fifo->irq_info[irq_id].irq)
+		return;
+
+	for (i = 0; i < MLXBF_TMFIFO_VDEV_MAX; i++) {
+		tm_vdev = fifo->vdev[i];
+		if (tm_vdev) {
+			vring = &tm_vdev->vrings[queue_id];
+			if (vring->vq)
+				mlxbf_tmfifo_rxtx(vring, is_rx);
+		}
+	}
+}
+
+/* Work handler for Rx and Tx case. */
+static void mlxbf_tmfifo_work_handler(struct work_struct *work)
+{
+	struct mlxbf_tmfifo *fifo;
+
+	fifo = container_of(work, struct mlxbf_tmfifo, work);
+	if (!fifo->is_ready)
+		return;
+
+	mutex_lock(&fifo->lock);
+
+	/* Tx (Send data to the TmFifo). */
+	mlxbf_tmfifo_work_rxtx(fifo, MLXBF_TMFIFO_VRING_TX,
+			       MLXBF_TM_TX_LWM_IRQ, false);
+
+	/* Rx (Receive data from the TmFifo). */
+	mlxbf_tmfifo_work_rxtx(fifo, MLXBF_TMFIFO_VRING_RX,
+			       MLXBF_TM_RX_HWM_IRQ, true);
+
+	mutex_unlock(&fifo->lock);
+}
+
+/* The notify function is called when new buffers are posted. */
+static bool mlxbf_tmfifo_virtio_notify(struct virtqueue *vq)
+{
+	struct mlxbf_tmfifo_vring *vring = vq->priv;
+	struct mlxbf_tmfifo_vdev *tm_vdev;
+	struct mlxbf_tmfifo *fifo;
+	unsigned long flags;
+
+	fifo = vring->fifo;
+
+	/*
+	 * Virtio maintains vrings in pairs, even number ring for Rx
+	 * and odd number ring for Tx.
+	 */
+	if (!(vring->index & BIT(0))) {
+		if (test_and_set_bit(MLXBF_TM_RX_HWM_IRQ, &fifo->pend_events))
+			return true;
+	} else {
+		/*
+		 * Console could make blocking call with interrupts disabled.
+		 * In such case, the vring needs to be served right away. For
+		 * other cases, just set the TX LWM bit to start Tx in the
+		 * worker handler.
+		 */
+		if (vring->vdev_id == VIRTIO_ID_CONSOLE) {
+			spin_lock_irqsave(&fifo->spin_lock, flags);
+			tm_vdev = fifo->vdev[VIRTIO_ID_CONSOLE];
+			mlxbf_tmfifo_console_output(tm_vdev, vring);
+			spin_unlock_irqrestore(&fifo->spin_lock, flags);
+		} else if (test_and_set_bit(MLXBF_TM_TX_LWM_IRQ,
+					    &fifo->pend_events)) {
+			return true;
+		}
+	}
+
+	schedule_work(&fifo->work);
+
+	return true;
+}
+
+/* Get the array of feature bits for this device. */
+static u64 mlxbf_tmfifo_virtio_get_features(struct virtio_device *vdev)
+{
+	struct mlxbf_tmfifo_vdev *tm_vdev = mlxbf_vdev_to_tmfifo(vdev);
+
+	return tm_vdev->features;
+}
+
+/* Confirm device features to use. */
+static int mlxbf_tmfifo_virtio_finalize_features(struct virtio_device *vdev)
+{
+	struct mlxbf_tmfifo_vdev *tm_vdev = mlxbf_vdev_to_tmfifo(vdev);
+
+	tm_vdev->features = vdev->features;
+
+	return 0;
+}
+
+/* Free virtqueues found by find_vqs(). */
+static void mlxbf_tmfifo_virtio_del_vqs(struct virtio_device *vdev)
+{
+	struct mlxbf_tmfifo_vdev *tm_vdev = mlxbf_vdev_to_tmfifo(vdev);
+	struct mlxbf_tmfifo_vring *vring;
+	struct virtqueue *vq;
+	int i;
+
+	for (i = 0; i < ARRAY_SIZE(tm_vdev->vrings); i++) {
+		vring = &tm_vdev->vrings[i];
+
+		/* Release the pending packet. */
+		if (vring->desc)
+			mlxbf_tmfifo_release_pending_pkt(vring);
+		vq = vring->vq;
+		if (vq) {
+			vring->vq = NULL;
+			vring_del_virtqueue(vq);
+		}
+	}
+}
+
+/* Create and initialize the virtual queues. */
+static int mlxbf_tmfifo_virtio_find_vqs(struct virtio_device *vdev,
+					unsigned int nvqs,
+					struct virtqueue *vqs[],
+					vq_callback_t *callbacks[],
+					const char * const names[],
+					const bool *ctx,
+					struct irq_affinity *desc)
+{
+	struct mlxbf_tmfifo_vdev *tm_vdev = mlxbf_vdev_to_tmfifo(vdev);
+	struct mlxbf_tmfifo_vring *vring;
+	struct virtqueue *vq;
+	int i, ret, size;
+
+	if (nvqs > ARRAY_SIZE(tm_vdev->vrings))
+		return -EINVAL;
+
+	for (i = 0; i < nvqs; ++i) {
+		if (!names[i]) {
+			ret = -EINVAL;
+			goto error;
+		}
+		vring = &tm_vdev->vrings[i];
+
+		/* zero vring */
+		size = vring_size(vring->num, vring->align);
+		memset(vring->va, 0, size);
+		vq = vring_new_virtqueue(i, vring->num, vring->align, vdev,
+					 false, false, vring->va,
+					 mlxbf_tmfifo_virtio_notify,
+					 callbacks[i], names[i]);
+		if (!vq) {
+			dev_err(&vdev->dev, "vring_new_virtqueue failed\n");
+			ret = -ENOMEM;
+			goto error;
+		}
+
+		vqs[i] = vq;
+		vring->vq = vq;
+		vq->priv = vring;
+	}
+
+	return 0;
+
+error:
+	mlxbf_tmfifo_virtio_del_vqs(vdev);
+	return ret;
+}
+
+/* Read the status byte. */
+static u8 mlxbf_tmfifo_virtio_get_status(struct virtio_device *vdev)
+{
+	struct mlxbf_tmfifo_vdev *tm_vdev = mlxbf_vdev_to_tmfifo(vdev);
+
+	return tm_vdev->status;
+}
+
+/* Write the status byte. */
+static void mlxbf_tmfifo_virtio_set_status(struct virtio_device *vdev,
+					   u8 status)
+{
+	struct mlxbf_tmfifo_vdev *tm_vdev = mlxbf_vdev_to_tmfifo(vdev);
+
+	tm_vdev->status = status;
+}
+
+/* Reset the device. Not much here for now. */
+static void mlxbf_tmfifo_virtio_reset(struct virtio_device *vdev)
+{
+	struct mlxbf_tmfifo_vdev *tm_vdev = mlxbf_vdev_to_tmfifo(vdev);
+
+	tm_vdev->status = 0;
+}
+
+/* Read the value of a configuration field. */
+static void mlxbf_tmfifo_virtio_get(struct virtio_device *vdev,
+				    unsigned int offset,
+				    void *buf,
+				    unsigned int len)
+{
+	struct mlxbf_tmfifo_vdev *tm_vdev = mlxbf_vdev_to_tmfifo(vdev);
+
+	if (offset + len > sizeof(tm_vdev->config))
+		return;
+
+	memcpy(buf, (u8 *)&tm_vdev->config + offset, len);
+}
+
+/* Write the value of a configuration field. */
+static void mlxbf_tmfifo_virtio_set(struct virtio_device *vdev,
+				    unsigned int offset,
+				    const void *buf,
+				    unsigned int len)
+{
+	struct mlxbf_tmfifo_vdev *tm_vdev = mlxbf_vdev_to_tmfifo(vdev);
+
+	if (offset + len > sizeof(tm_vdev->config))
+		return;
+
+	memcpy((u8 *)&tm_vdev->config + offset, buf, len);
+}
+
+static void tmfifo_virtio_dev_release(struct device *device)
+{
+	struct virtio_device *vdev =
+			container_of(device, struct virtio_device, dev);
+	struct mlxbf_tmfifo_vdev *tm_vdev = mlxbf_vdev_to_tmfifo(vdev);
+
+	kfree(tm_vdev);
+}
+
+/* Virtio config operations. */
+static const struct virtio_config_ops mlxbf_tmfifo_virtio_config_ops = {
+	.get_features = mlxbf_tmfifo_virtio_get_features,
+	.finalize_features = mlxbf_tmfifo_virtio_finalize_features,
+	.find_vqs = mlxbf_tmfifo_virtio_find_vqs,
+	.del_vqs = mlxbf_tmfifo_virtio_del_vqs,
+	.reset = mlxbf_tmfifo_virtio_reset,
+	.set_status = mlxbf_tmfifo_virtio_set_status,
+	.get_status = mlxbf_tmfifo_virtio_get_status,
+	.get = mlxbf_tmfifo_virtio_get,
+	.set = mlxbf_tmfifo_virtio_set,
+};
+
+/* Create vdev type in a tmfifo. */
+static int mlxbf_tmfifo_create_vdev(struct device *dev,
+				    struct mlxbf_tmfifo *fifo,
+				    int vdev_id, u64 features,
+				    void *config, u32 size)
+{
+	struct mlxbf_tmfifo_vdev *tm_vdev, *reg_dev = NULL;
+	int ret;
+
+	mutex_lock(&fifo->lock);
+
+	tm_vdev = fifo->vdev[vdev_id];
+	if (tm_vdev) {
+		dev_err(dev, "vdev %d already exists\n", vdev_id);
+		ret = -EEXIST;
+		goto fail;
+	}
+
+	tm_vdev = kzalloc(sizeof(*tm_vdev), GFP_KERNEL);
+	if (!tm_vdev) {
+		ret = -ENOMEM;
+		goto fail;
+	}
+
+	tm_vdev->vdev.id.device = vdev_id;
+	tm_vdev->vdev.config = &mlxbf_tmfifo_virtio_config_ops;
+	tm_vdev->vdev.dev.parent = &fifo->pdev->dev;
+	tm_vdev->vdev.dev.release = tmfifo_virtio_dev_release;
+	tm_vdev->features = features;
+	if (config)
+		memcpy(&tm_vdev->config, config, size);
+
+	if (mlxbf_tmfifo_alloc_vrings(fifo, tm_vdev)) {
+		dev_err(dev, "unable to allocate vring\n");
+		ret = -ENOMEM;
+		goto vdev_fail;
+	}
+
+	/* Allocate an output buffer for the console device. */
+	if (vdev_id == VIRTIO_ID_CONSOLE)
+		tm_vdev->tx_buf.buf = devm_kmalloc(dev,
+						   MLXBF_TMFIFO_CON_TX_BUF_SIZE,
+						   GFP_KERNEL);
+	fifo->vdev[vdev_id] = tm_vdev;
+
+	/* Register the virtio device. */
+	ret = register_virtio_device(&tm_vdev->vdev);
+	reg_dev = tm_vdev;
+	if (ret) {
+		dev_err(&fifo->pdev->dev, "register_virtio_device failed\n");
+		goto vdev_fail;
+	}
+
+	mutex_unlock(&fifo->lock);
+	return 0;
+
+vdev_fail:
+	mlxbf_tmfifo_free_vrings(fifo, tm_vdev);
+	fifo->vdev[vdev_id] = NULL;
+	if (reg_dev)
+		put_device(&tm_vdev->vdev.dev);
+	else
+		kfree(tm_vdev);
+fail:
+	mutex_unlock(&fifo->lock);
+	return ret;
+}
+
+/* Delete vdev type from a tmfifo. */
+static int mlxbf_tmfifo_delete_vdev(struct mlxbf_tmfifo *fifo, int vdev_id)
+{
+	struct mlxbf_tmfifo_vdev *tm_vdev;
+
+	mutex_lock(&fifo->lock);
+
+	/* Unregister vdev. */
+	tm_vdev = fifo->vdev[vdev_id];
+	if (tm_vdev) {
+		unregister_virtio_device(&tm_vdev->vdev);
+		mlxbf_tmfifo_free_vrings(fifo, tm_vdev);
+		fifo->vdev[vdev_id] = NULL;
+	}
+
+	mutex_unlock(&fifo->lock);
+
+	return 0;
+}
+
+/* Read the configured network MAC address from efi variable. */
+static void mlxbf_tmfifo_get_cfg_mac(u8 *mac)
+{
+	efi_guid_t guid = EFI_GLOBAL_VARIABLE_GUID;
+	unsigned long size = ETH_ALEN;
+	efi_status_t status;
+	u8 buf[ETH_ALEN];
+
+	status = efi.get_variable(mlxbf_tmfifo_efi_name, &guid, NULL, &size,
+				  buf);
+	if (status == EFI_SUCCESS && size == ETH_ALEN)
+		ether_addr_copy(mac, buf);
+	else
+		memcpy(mac, mlxbf_tmfifo_net_default_mac, ETH_ALEN);
+}
+
+/* Set TmFifo thresolds which is used to trigger interrupts. */
+static void mlxbf_tmfifo_set_threshold(struct mlxbf_tmfifo *fifo)
+{
+	u64 ctl;
+
+	/* Get Tx FIFO size and set the low/high watermark. */
+	ctl = readq(fifo->tx_base + MLXBF_TMFIFO_TX_CTL);
+	fifo->tx_fifo_size =
+		FIELD_GET(MLXBF_TMFIFO_TX_CTL__MAX_ENTRIES_MASK, ctl);
+	ctl = (ctl & ~MLXBF_TMFIFO_TX_CTL__LWM_MASK) |
+		FIELD_PREP(MLXBF_TMFIFO_TX_CTL__LWM_MASK,
+			   fifo->tx_fifo_size / 2);
+	ctl = (ctl & ~MLXBF_TMFIFO_TX_CTL__HWM_MASK) |
+		FIELD_PREP(MLXBF_TMFIFO_TX_CTL__HWM_MASK,
+			   fifo->tx_fifo_size - 1);
+	writeq(ctl, fifo->tx_base + MLXBF_TMFIFO_TX_CTL);
+
+	/* Get Rx FIFO size and set the low/high watermark. */
+	ctl = readq(fifo->rx_base + MLXBF_TMFIFO_RX_CTL);
+	fifo->rx_fifo_size =
+		FIELD_GET(MLXBF_TMFIFO_RX_CTL__MAX_ENTRIES_MASK, ctl);
+	ctl = (ctl & ~MLXBF_TMFIFO_RX_CTL__LWM_MASK) |
+		FIELD_PREP(MLXBF_TMFIFO_RX_CTL__LWM_MASK, 0);
+	ctl = (ctl & ~MLXBF_TMFIFO_RX_CTL__HWM_MASK) |
+		FIELD_PREP(MLXBF_TMFIFO_RX_CTL__HWM_MASK, 1);
+	writeq(ctl, fifo->rx_base + MLXBF_TMFIFO_RX_CTL);
+}
+
+static void mlxbf_tmfifo_cleanup(struct mlxbf_tmfifo *fifo)
+{
+	int i;
+
+	fifo->is_ready = false;
+	del_timer_sync(&fifo->timer);
+	mlxbf_tmfifo_disable_irqs(fifo);
+	cancel_work_sync(&fifo->work);
+	for (i = 0; i < MLXBF_TMFIFO_VDEV_MAX; i++)
+		mlxbf_tmfifo_delete_vdev(fifo, i);
+}
+
+/* Probe the TMFIFO. */
+static int mlxbf_tmfifo_probe(struct platform_device *pdev)
+{
+	struct virtio_net_config net_config;
+	struct mlxbf_tmfifo *fifo;
+	struct resource *res;
+	int i, ret;
+
+	fifo = devm_kzalloc(&pdev->dev, sizeof(*fifo), GFP_KERNEL);
+	if (!fifo)
+		return -ENOMEM;
+
+	spin_lock_init(&fifo->spin_lock);
+	INIT_WORK(&fifo->work, mlxbf_tmfifo_work_handler);
+	mutex_init(&fifo->lock);
+
+	/* Get the resource of the Rx FIFO. */
+	res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
+	fifo->rx_base = devm_ioremap_resource(&pdev->dev, res);
+	if (IS_ERR(fifo->rx_base))
+		return PTR_ERR(fifo->rx_base);
+
+	/* Get the resource of the Tx FIFO. */
+	res = platform_get_resource(pdev, IORESOURCE_MEM, 1);
+	fifo->tx_base = devm_ioremap_resource(&pdev->dev, res);
+	if (IS_ERR(fifo->tx_base))
+		return PTR_ERR(fifo->tx_base);
+
+	fifo->pdev = pdev;
+	platform_set_drvdata(pdev, fifo);
+
+	timer_setup(&fifo->timer, mlxbf_tmfifo_timer, 0);
+
+	for (i = 0; i < MLXBF_TM_MAX_IRQ; i++) {
+		fifo->irq_info[i].index = i;
+		fifo->irq_info[i].fifo = fifo;
+		fifo->irq_info[i].irq = platform_get_irq(pdev, i);
+		ret = devm_request_irq(&pdev->dev, fifo->irq_info[i].irq,
+				       mlxbf_tmfifo_irq_handler, 0,
+				       "tmfifo", &fifo->irq_info[i]);
+		if (ret) {
+			dev_err(&pdev->dev, "devm_request_irq failed\n");
+			fifo->irq_info[i].irq = 0;
+			return ret;
+		}
+	}
+
+	mlxbf_tmfifo_set_threshold(fifo);
+
+	/* Create the console vdev. */
+	ret = mlxbf_tmfifo_create_vdev(&pdev->dev, fifo, VIRTIO_ID_CONSOLE, 0,
+				       NULL, 0);
+	if (ret)
+		goto fail;
+
+	/* Create the network vdev. */
+	memset(&net_config, 0, sizeof(net_config));
+	net_config.mtu = ETH_DATA_LEN;
+	net_config.status = VIRTIO_NET_S_LINK_UP;
+	mlxbf_tmfifo_get_cfg_mac(net_config.mac);
+	ret = mlxbf_tmfifo_create_vdev(&pdev->dev, fifo, VIRTIO_ID_NET,
+				       MLXBF_TMFIFO_NET_FEATURES, &net_config,
+				       sizeof(net_config));
+	if (ret)
+		goto fail;
+
+	mod_timer(&fifo->timer, jiffies + MLXBF_TMFIFO_TIMER_INTERVAL);
+
+	fifo->is_ready = true;
+	return 0;
+
+fail:
+	mlxbf_tmfifo_cleanup(fifo);
+	return ret;
+}
+
+/* Device remove function. */
+static int mlxbf_tmfifo_remove(struct platform_device *pdev)
+{
+	struct mlxbf_tmfifo *fifo = platform_get_drvdata(pdev);
+
+	mlxbf_tmfifo_cleanup(fifo);
+
+	return 0;
+}
+
+static const struct acpi_device_id mlxbf_tmfifo_acpi_match[] = {
+	{ "MLNXBF01", 0 },
+	{}
+};
+MODULE_DEVICE_TABLE(acpi, mlxbf_tmfifo_acpi_match);
+
+static struct platform_driver mlxbf_tmfifo_driver = {
+	.probe = mlxbf_tmfifo_probe,
+	.remove = mlxbf_tmfifo_remove,
+	.driver = {
+		.name = "bf-tmfifo",
+		.acpi_match_table = mlxbf_tmfifo_acpi_match,
+	},
+};
+
+module_platform_driver(mlxbf_tmfifo_driver);
+
+MODULE_DESCRIPTION("Mellanox BlueField SoC TmFifo Driver");
+MODULE_LICENSE("GPL v2");
+MODULE_AUTHOR("Mellanox Technologies");
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v13] platform/mellanox: Add TmFifo driver for Mellanox BlueField Soc
       [not found] <b143b40446c1870fb8d422b364ead95d54552be9.1527264077.git.lsun@mellanox.com>
                   ` (7 preceding siblings ...)
  2019-03-28 19:56 ` [PATCH v12] " Liming Sun
@ 2019-04-04 19:36 ` Liming Sun
  2019-04-05 15:44   ` Andy Shevchenko
  2019-04-07  2:03 ` [PATCH v14] " Liming Sun
                   ` (2 subsequent siblings)
  11 siblings, 1 reply; 30+ messages in thread
From: Liming Sun @ 2019-04-04 19:36 UTC (permalink / raw)
  To: David Woods, Andy Shevchenko, Darren Hart, Vadim Pasternak
  Cc: Liming Sun, linux-kernel, platform-driver-x86

This commit adds the TmFifo platform driver for Mellanox BlueField
Soc. TmFifo is a shared FIFO which enables external host machine
to exchange data with the SoC via USB or PCIe. The driver is based
on virtio framework and has console and network access enabled.

Reviewed-by: Vadim Pasternak <vadimp@mellanox.com>
Signed-off-by: Liming Sun <lsun@mellanox.com>
---
v12->v13:
    Rebase and resubmit (no new changes).
v11->v12:
    Fixed the two unsolved comments from v11.
    - "Change macro mlxbf_vdev_to_tmfifo() to one line"
      Done. Seems not hard.
    - "Is it appropriate use of devm_* for 'tm_vdev = devm_kzalloc'"
      Yes, understand the comment now. The tmfifo is fixed, but the
      vdev is dynamic. Use kzalloc() instead, and free the device
      in the release callback which is the right place for it.
v10->v11:
    Fixes for comments from Andy:
    - Use GENMASK_ULL() instead of GENMASK() in mlxbf-tmfifo-regs.h
    - Removed the cpu_to_le64()/le64_to_cpu() conversion since
      readq()/writeq() already takes care of it.
    - Remove the "if (irq)" check in mlxbf_tmfifo_disable_irqs().
    - Add "u32 count" temp variable in mlxbf_tmfifo_get_tx_avail().
    - Clean up mlxbf_tmfifo_get_cfg_mac(), use ETH_ALEN instead of
      value 6.
    - Change the tx_buf to use Linux existing 'struct circ_buf'.
    Comment not applied:
    - "Change macro mlxbf_vdev_to_tmfifo() to one line"
      Couldn't fit in one line with 80 chracters
    - "Is it appropriate use of devm_* for 'tm_vdev = devm_kzalloc'"
      This is SoC, the device won't be closed or detached.
      The only case is when the driver is unloaded. So it appears
      ok to use devm_kzalloc() since it's allocated during probe()
      and released during module unload.
    Comments from Vadim: OK
v9->v10:
    Fixes for comments from Andy:
    - Use devm_ioremap_resource() instead of devm_ioremap().
    - Use kernel-doc comments.
    - Keep Makefile contents sorted.
    - Use same fixed format for offsets.
    - Use SZ_1K/SZ_32K instead of 1024/23*1024.
    - Remove unnecessary comments.
    - Use one style for max numbers.
    - More comments for mlxbf_tmfifo_vdev and mlxbf_tmfifo_data_64bit.
    - Use globally defined MTU instead of new definition.
    - Remove forward declaration of mlxbf_tmfifo_remove().
    - Remove PAGE_ALIGN() for dma_alloc_coherent)().
    - Remove the cast of "struct vring *".
    - Check return result of test_and_set_bit().
    - Add a macro mlxbt_vdev_to_tmfifo().
    - Several other minor coding style comments.
    Comment not applied:
    - "Shouldn't be rather helper in EFI lib in kernel"
      Looks like efi.get_variable() is the way I found in the kernel
      tree.
    - "this one is not protected anyhow? Potential race condition"
      In mlxbf_tmfifo_console_tx(), the spin-lock is used to protect the
      'tx_buf' only, not the FIFO writes. So there is no race condition.
    - "Is __packed needed in mlxbf_tmfifo_msg_hdr".
      Yes, it is needed to make sure the structure is 8 bytes.
    Fixes for comments from Vadim:
    - Use tab in mlxbf-tmfifo-regs.h
    - Use kernel-doc comments for struct mlxbf_tmfifo_msg_hdr and
      mlxbf_tmfifo_irq_info as well.
    - Use _MAX instead of _CNT in the macro definition to be consistent.
    - Fix the MODULE_LICENSE.
    - Use BIT_ULL() instead of BIT().
    - Remove argument of 'avail' for mlxbf_tmfifo_rxtx_header() and
      mlxbf_tmfifo_rxtx_word()
    - Revise logic in mlxbf_tmfifo_rxtx_one_desc() to remove the
      WARN_ON().
    - Change "union mlxbf_tmfifo_u64 u" to "union mlxbf_tmfifo_u64 buf"
      in mlxbf_tmfifo_rxtx_word().
    - Change date type of vring_change from 'int' to 'bool'.
    - Remove the blank lines after Signed-off.
    - Don’t use declaration in the middle.
    - Make the network header initialization in some more elegant way.
    - Change label done to mlxbf_tmfifo_desc_done.
    - Remove some unnecessary comments, and several other misc coding
      style comments.
    - Simplify code logic in mlxbf_tmfifo_virtio_notify()
    New changes by Liming:
    - Simplify the Rx/Tx function arguments to make it more readable.
v8->v9:
    Fixes for comments from Andy:
    - Use modern devm_xxx() API instead.
    Fixes for comments from Vadim:
    - Split the Rx/Tx function into smaller funcitons.
    - File name, copyright information.
    - Function and variable name conversion.
    - Local variable and indent coding styles.
    - Remove unnecessary 'inline' declarations.
    - Use devm_xxx() APIs.
    - Move the efi_char16_t MAC address definition to global.
    - Fix warnings reported by 'checkpatch --strict'.
    - Fix warnings reported by 'make CF="-D__CHECK_ENDIAN__"'.
    - Change select VIRTIO_xxx to depends on  VIRTIO_ in Kconfig.
    - Merge mlxbf_tmfifo_vdev_tx_buf_push() and
      mlxbf_tmfifo_vdev_tx_buf_pop().
    - Add union to avoid casting between __le64 and u64.
    - Several other misc coding style comments.
    New changes by Liming:
    - Removed the DT binding documentation since only ACPI is
      supported for now by UEFI on the SoC.
v8: Re-submit under drivers/platform/mellanox for the target-side
    platform driver only.
v7: Added host side drivers into the same patch set.
v5~v6: Coding style fix.
v1~v4: Initial version for directory drivers/soc/mellanox.
---
 drivers/platform/mellanox/Kconfig             |   12 +-
 drivers/platform/mellanox/Makefile            |    1 +
 drivers/platform/mellanox/mlxbf-tmfifo-regs.h |   63 ++
 drivers/platform/mellanox/mlxbf-tmfifo.c      | 1291 +++++++++++++++++++++++++
 4 files changed, 1366 insertions(+), 1 deletion(-)
 create mode 100644 drivers/platform/mellanox/mlxbf-tmfifo-regs.h
 create mode 100644 drivers/platform/mellanox/mlxbf-tmfifo.c

diff --git a/drivers/platform/mellanox/Kconfig b/drivers/platform/mellanox/Kconfig
index cd8a908..530fe7e 100644
--- a/drivers/platform/mellanox/Kconfig
+++ b/drivers/platform/mellanox/Kconfig
@@ -5,7 +5,7 @@
 
 menuconfig MELLANOX_PLATFORM
 	bool "Platform support for Mellanox hardware"
-	depends on X86 || ARM || COMPILE_TEST
+	depends on X86 || ARM || ARM64 || COMPILE_TEST
 	---help---
 	  Say Y here to get to see options for platform support for
 	  Mellanox systems. This option alone does not add any kernel code.
@@ -34,4 +34,14 @@ config MLXREG_IO
 	  to system resets operation, system reset causes monitoring and some
 	  kinds of mux selection.
 
+config MLXBF_TMFIFO
+	tristate "Mellanox BlueField SoC TmFifo platform driver"
+	depends on ARM64
+	depends on ACPI
+	depends on VIRTIO_CONSOLE && VIRTIO_NET
+	help
+	  Say y here to enable TmFifo support. The TmFifo driver provides
+          platform driver support for the TmFifo which supports console
+          and networking based on the virtio framework.
+
 endif # MELLANOX_PLATFORM
diff --git a/drivers/platform/mellanox/Makefile b/drivers/platform/mellanox/Makefile
index 57074d9c..a229bda1 100644
--- a/drivers/platform/mellanox/Makefile
+++ b/drivers/platform/mellanox/Makefile
@@ -3,5 +3,6 @@
 # Makefile for linux/drivers/platform/mellanox
 # Mellanox Platform-Specific Drivers
 #
+obj-$(CONFIG_MLXBF_TMFIFO)	+= mlxbf-tmfifo.o
 obj-$(CONFIG_MLXREG_HOTPLUG)	+= mlxreg-hotplug.o
 obj-$(CONFIG_MLXREG_IO) += mlxreg-io.o
diff --git a/drivers/platform/mellanox/mlxbf-tmfifo-regs.h b/drivers/platform/mellanox/mlxbf-tmfifo-regs.h
new file mode 100644
index 0000000..e4f0d2e
--- /dev/null
+++ b/drivers/platform/mellanox/mlxbf-tmfifo-regs.h
@@ -0,0 +1,63 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Copyright (c) 2019, Mellanox Technologies. All rights reserved.
+ */
+
+#ifndef __MLXBF_TMFIFO_REGS_H__
+#define __MLXBF_TMFIFO_REGS_H__
+
+#include <linux/types.h>
+#include <linux/bits.h>
+
+#define MLXBF_TMFIFO_TX_DATA				0x00
+#define MLXBF_TMFIFO_TX_STS				0x08
+#define MLXBF_TMFIFO_TX_STS__LENGTH			0x0001
+#define MLXBF_TMFIFO_TX_STS__COUNT_SHIFT		0
+#define MLXBF_TMFIFO_TX_STS__COUNT_WIDTH		9
+#define MLXBF_TMFIFO_TX_STS__COUNT_RESET_VAL		0
+#define MLXBF_TMFIFO_TX_STS__COUNT_RMASK		GENMASK_ULL(8, 0)
+#define MLXBF_TMFIFO_TX_STS__COUNT_MASK			GENMASK_ULL(8, 0)
+#define MLXBF_TMFIFO_TX_CTL				0x10
+#define MLXBF_TMFIFO_TX_CTL__LENGTH			0x0001
+#define MLXBF_TMFIFO_TX_CTL__LWM_SHIFT			0
+#define MLXBF_TMFIFO_TX_CTL__LWM_WIDTH			8
+#define MLXBF_TMFIFO_TX_CTL__LWM_RESET_VAL		128
+#define MLXBF_TMFIFO_TX_CTL__LWM_RMASK			GENMASK_ULL(7, 0)
+#define MLXBF_TMFIFO_TX_CTL__LWM_MASK			GENMASK_ULL(7, 0)
+#define MLXBF_TMFIFO_TX_CTL__HWM_SHIFT			8
+#define MLXBF_TMFIFO_TX_CTL__HWM_WIDTH			8
+#define MLXBF_TMFIFO_TX_CTL__HWM_RESET_VAL		128
+#define MLXBF_TMFIFO_TX_CTL__HWM_RMASK			GENMASK_ULL(7, 0)
+#define MLXBF_TMFIFO_TX_CTL__HWM_MASK			GENMASK_ULL(15, 8)
+#define MLXBF_TMFIFO_TX_CTL__MAX_ENTRIES_SHIFT		32
+#define MLXBF_TMFIFO_TX_CTL__MAX_ENTRIES_WIDTH		9
+#define MLXBF_TMFIFO_TX_CTL__MAX_ENTRIES_RESET_VAL	256
+#define MLXBF_TMFIFO_TX_CTL__MAX_ENTRIES_RMASK		GENMASK_ULL(8, 0)
+#define MLXBF_TMFIFO_TX_CTL__MAX_ENTRIES_MASK		GENMASK_ULL(40, 32)
+#define MLXBF_TMFIFO_RX_DATA				0x00
+#define MLXBF_TMFIFO_RX_STS				0x08
+#define MLXBF_TMFIFO_RX_STS__LENGTH			0x0001
+#define MLXBF_TMFIFO_RX_STS__COUNT_SHIFT		0
+#define MLXBF_TMFIFO_RX_STS__COUNT_WIDTH		9
+#define MLXBF_TMFIFO_RX_STS__COUNT_RESET_VAL		0
+#define MLXBF_TMFIFO_RX_STS__COUNT_RMASK		GENMASK_ULL(8, 0)
+#define MLXBF_TMFIFO_RX_STS__COUNT_MASK			GENMASK_ULL(8, 0)
+#define MLXBF_TMFIFO_RX_CTL				0x10
+#define MLXBF_TMFIFO_RX_CTL__LENGTH			0x0001
+#define MLXBF_TMFIFO_RX_CTL__LWM_SHIFT			0
+#define MLXBF_TMFIFO_RX_CTL__LWM_WIDTH			8
+#define MLXBF_TMFIFO_RX_CTL__LWM_RESET_VAL		128
+#define MLXBF_TMFIFO_RX_CTL__LWM_RMASK			GENMASK_ULL(7, 0)
+#define MLXBF_TMFIFO_RX_CTL__LWM_MASK			GENMASK_ULL(7, 0)
+#define MLXBF_TMFIFO_RX_CTL__HWM_SHIFT			8
+#define MLXBF_TMFIFO_RX_CTL__HWM_WIDTH			8
+#define MLXBF_TMFIFO_RX_CTL__HWM_RESET_VAL		128
+#define MLXBF_TMFIFO_RX_CTL__HWM_RMASK			GENMASK_ULL(7, 0)
+#define MLXBF_TMFIFO_RX_CTL__HWM_MASK			GENMASK_ULL(15, 8)
+#define MLXBF_TMFIFO_RX_CTL__MAX_ENTRIES_SHIFT		32
+#define MLXBF_TMFIFO_RX_CTL__MAX_ENTRIES_WIDTH		9
+#define MLXBF_TMFIFO_RX_CTL__MAX_ENTRIES_RESET_VAL	256
+#define MLXBF_TMFIFO_RX_CTL__MAX_ENTRIES_RMASK		GENMASK_ULL(8, 0)
+#define MLXBF_TMFIFO_RX_CTL__MAX_ENTRIES_MASK		GENMASK_ULL(40, 32)
+
+#endif /* !defined(__MLXBF_TMFIFO_REGS_H__) */
diff --git a/drivers/platform/mellanox/mlxbf-tmfifo.c b/drivers/platform/mellanox/mlxbf-tmfifo.c
new file mode 100644
index 0000000..2bc03c3
--- /dev/null
+++ b/drivers/platform/mellanox/mlxbf-tmfifo.c
@@ -0,0 +1,1291 @@
+// SPDX-License-Identifier: GPL-2.0+
+/*
+ * Mellanox BlueField SoC TmFifo driver
+ *
+ * Copyright (C) 2019 Mellanox Technologies
+ */
+
+#include <linux/acpi.h>
+#include <linux/bitfield.h>
+#include <linux/circ_buf.h>
+#include <linux/efi.h>
+#include <linux/irq.h>
+#include <linux/module.h>
+#include <linux/mutex.h>
+#include <linux/platform_device.h>
+#include <linux/types.h>
+#include <linux/virtio_config.h>
+#include <linux/virtio_console.h>
+#include <linux/virtio_ids.h>
+#include <linux/virtio_net.h>
+#include <linux/virtio_ring.h>
+
+#include "mlxbf-tmfifo-regs.h"
+
+/* Vring size. */
+#define MLXBF_TMFIFO_VRING_SIZE			SZ_1K
+
+/* Console Tx buffer size. */
+#define MLXBF_TMFIFO_CON_TX_BUF_SIZE		SZ_32K
+
+/* Console Tx buffer reserved space. */
+#define MLXBF_TMFIFO_CON_TX_BUF_RSV_SIZE	8
+
+/* House-keeping timer interval. */
+#define MLXBF_TMFIFO_TIMER_INTERVAL		(HZ / 10)
+
+/* Virtual devices sharing the TM FIFO. */
+#define MLXBF_TMFIFO_VDEV_MAX		(VIRTIO_ID_CONSOLE + 1)
+
+/*
+ * Reserve 1/16 of TmFifo space, so console messages are not starved by
+ * the networking traffic.
+ */
+#define MLXBF_TMFIFO_RESERVE_RATIO		16
+
+/* Message with data needs at least two words (for header & data). */
+#define MLXBF_TMFIFO_DATA_MIN_WORDS		2
+
+struct mlxbf_tmfifo;
+
+/**
+ * mlxbf_tmfifo_vring - Structure of the TmFifo virtual ring
+ * @va: virtual address of the ring
+ * @dma: dma address of the ring
+ * @vq: pointer to the virtio virtqueue
+ * @desc: current descriptor of the pending packet
+ * @desc_head: head descriptor of the pending packet
+ * @cur_len: processed length of the current descriptor
+ * @rem_len: remaining length of the pending packet
+ * @pkt_len: total length of the pending packet
+ * @next_avail: next avail descriptor id
+ * @num: vring size (number of descriptors)
+ * @align: vring alignment size
+ * @index: vring index
+ * @vdev_id: vring virtio id (VIRTIO_ID_xxx)
+ * @fifo: pointer to the tmfifo structure
+ */
+struct mlxbf_tmfifo_vring {
+	void *va;
+	dma_addr_t dma;
+	struct virtqueue *vq;
+	struct vring_desc *desc;
+	struct vring_desc *desc_head;
+	int cur_len;
+	int rem_len;
+	u32 pkt_len;
+	u16 next_avail;
+	int num;
+	int align;
+	int index;
+	int vdev_id;
+	struct mlxbf_tmfifo *fifo;
+};
+
+/* Interrupt types. */
+enum {
+	MLXBF_TM_RX_LWM_IRQ,
+	MLXBF_TM_RX_HWM_IRQ,
+	MLXBF_TM_TX_LWM_IRQ,
+	MLXBF_TM_TX_HWM_IRQ,
+	MLXBF_TM_MAX_IRQ
+};
+
+/* Ring types (Rx & Tx). */
+enum {
+	MLXBF_TMFIFO_VRING_RX,
+	MLXBF_TMFIFO_VRING_TX,
+	MLXBF_TMFIFO_VRING_MAX
+};
+
+/**
+ * mlxbf_tmfifo_vdev - Structure of the TmFifo virtual device
+ * @vdev: virtio device, in which the vdev.id.device field has the
+ *        VIRTIO_ID_xxx id to distinguish the virtual device.
+ * @status: status of the device
+ * @features: supported features of the device
+ * @vrings: array of tmfifo vrings of this device
+ * @config.cons: virtual console config -
+ *               select if vdev.id.device is VIRTIO_ID_CONSOLE
+ * @config.net: virtual network config -
+ *              select if vdev.id.device is VIRTIO_ID_NET
+ * @tx_buf: tx buffer used to buffer data before writing into the FIFO
+ */
+struct mlxbf_tmfifo_vdev {
+	struct virtio_device vdev;
+	u8 status;
+	u64 features;
+	struct mlxbf_tmfifo_vring vrings[MLXBF_TMFIFO_VRING_MAX];
+	union {
+		struct virtio_console_config cons;
+		struct virtio_net_config net;
+	} config;
+	struct circ_buf tx_buf;
+};
+
+/**
+ * mlxbf_tmfifo_irq_info - Structure of the interrupt information
+ * @fifo: pointer to the tmfifo structure
+ * @irq: interrupt number
+ * @index: index into the interrupt array
+ */
+struct mlxbf_tmfifo_irq_info {
+	struct mlxbf_tmfifo *fifo;
+	int irq;
+	int index;
+};
+
+/**
+ * mlxbf_tmfifo - Structure of the TmFifo
+ * @vdev: array of the virtual devices running over the TmFifo
+ * @pdev: platform device
+ * @lock: lock to protect the TmFifo access
+ * @rx_base: mapped register base address for the Rx fifo
+ * @tx_base: mapped register base address for the Tx fifo
+ * @rx_fifo_size: number of entries of the Rx fifo
+ * @tx_fifo_size: number of entries of the Tx fifo
+ * @pend_events: pending bits for deferred events
+ * @irq_info: interrupt information
+ * @work: work struct for deferred process
+ * @timer: background timer
+ * @vring: Tx/Rx ring
+ * @spin_lock: spin lock
+ * @is_ready: ready flag
+ */
+struct mlxbf_tmfifo {
+	struct mlxbf_tmfifo_vdev *vdev[MLXBF_TMFIFO_VDEV_MAX];
+	struct platform_device *pdev;
+	struct mutex lock;		/* TmFifo lock */
+	void __iomem *rx_base;
+	void __iomem *tx_base;
+	int rx_fifo_size;
+	int tx_fifo_size;
+	unsigned long pend_events;
+	struct mlxbf_tmfifo_irq_info irq_info[MLXBF_TM_MAX_IRQ];
+	struct work_struct work;
+	struct timer_list timer;
+	struct mlxbf_tmfifo_vring *vring[2];
+	spinlock_t spin_lock;		/* spin lock */
+	bool is_ready;
+};
+
+/**
+ * mlxbf_tmfifo_msg_hdr - Structure of the TmFifo message header
+ * @type: message type
+ * @len: payload length
+ * @u: 64-bit union data
+ */
+union mlxbf_tmfifo_msg_hdr {
+	struct {
+		u8 type;
+		__be16 len;
+		u8 unused[5];
+	} __packed;
+	u64 data;
+};
+
+/*
+ * Default MAC.
+ * This MAC address will be read from EFI persistent variable if configured.
+ * It can also be reconfigured with standard Linux tools.
+ */
+static u8 mlxbf_tmfifo_net_default_mac[ETH_ALEN] = {
+	0x00, 0x1A, 0xCA, 0xFF, 0xFF, 0x01};
+
+/* EFI variable name of the MAC address. */
+static efi_char16_t mlxbf_tmfifo_efi_name[] = L"RshimMacAddr";
+
+/* Maximum L2 header length. */
+#define MLXBF_TMFIFO_NET_L2_OVERHEAD	36
+
+/* Supported virtio-net features. */
+#define MLXBF_TMFIFO_NET_FEATURES	(BIT_ULL(VIRTIO_NET_F_MTU) | \
+					 BIT_ULL(VIRTIO_NET_F_STATUS) | \
+					 BIT_ULL(VIRTIO_NET_F_MAC))
+
+#define mlxbf_vdev_to_tmfifo(d) container_of(d, struct mlxbf_tmfifo_vdev, vdev)
+
+/* Allocate vrings for the fifo. */
+static int mlxbf_tmfifo_alloc_vrings(struct mlxbf_tmfifo *fifo,
+				     struct mlxbf_tmfifo_vdev *tm_vdev)
+{
+	struct mlxbf_tmfifo_vring *vring;
+	struct device *dev;
+	dma_addr_t dma;
+	int i, size;
+	void *va;
+
+	for (i = 0; i < ARRAY_SIZE(tm_vdev->vrings); i++) {
+		vring = &tm_vdev->vrings[i];
+		vring->fifo = fifo;
+		vring->num = MLXBF_TMFIFO_VRING_SIZE;
+		vring->align = SMP_CACHE_BYTES;
+		vring->index = i;
+		vring->vdev_id = tm_vdev->vdev.id.device;
+		dev = &tm_vdev->vdev.dev;
+
+		size = vring_size(vring->num, vring->align);
+		va = dma_alloc_coherent(dev->parent, size, &dma, GFP_KERNEL);
+		if (!va) {
+			dev_err(dev->parent, "dma_alloc_coherent failed\n");
+			return -ENOMEM;
+		}
+
+		vring->va = va;
+		vring->dma = dma;
+	}
+
+	return 0;
+}
+
+/* Free vrings of the fifo device. */
+static void mlxbf_tmfifo_free_vrings(struct mlxbf_tmfifo *fifo,
+				     struct mlxbf_tmfifo_vdev *tm_vdev)
+{
+	struct mlxbf_tmfifo_vring *vring;
+	int i, size;
+
+	for (i = 0; i < ARRAY_SIZE(tm_vdev->vrings); i++) {
+		vring = &tm_vdev->vrings[i];
+		if (vring->va) {
+			size = vring_size(vring->num, vring->align);
+			dma_free_coherent(tm_vdev->vdev.dev.parent, size,
+					  vring->va, vring->dma);
+			vring->va = NULL;
+			if (vring->vq) {
+				vring_del_virtqueue(vring->vq);
+				vring->vq = NULL;
+			}
+		}
+	}
+}
+
+/* Disable interrupts of the fifo device. */
+static void mlxbf_tmfifo_disable_irqs(struct mlxbf_tmfifo *fifo)
+{
+	int i, irq;
+
+	for (i = 0; i < MLXBF_TM_MAX_IRQ; i++) {
+		irq = fifo->irq_info[i].irq;
+		fifo->irq_info[i].irq = 0;
+		disable_irq(irq);
+	}
+}
+
+/* Interrupt handler. */
+static irqreturn_t mlxbf_tmfifo_irq_handler(int irq, void *arg)
+{
+	struct mlxbf_tmfifo_irq_info *irq_info = arg;
+
+	if (irq_info->index < MLXBF_TM_MAX_IRQ &&
+	    !test_and_set_bit(irq_info->index, &irq_info->fifo->pend_events))
+		schedule_work(&irq_info->fifo->work);
+
+	return IRQ_HANDLED;
+}
+
+/* Get the next packet descriptor from the vring. */
+static struct vring_desc *
+mlxbf_tmfifo_get_next_desc(struct mlxbf_tmfifo_vring *vring)
+{
+	const struct vring *vr = virtqueue_get_vring(vring->vq);
+	struct virtio_device *vdev = vring->vq->vdev;
+	unsigned int idx, head;
+
+	if (vring->next_avail == virtio16_to_cpu(vdev, vr->avail->idx))
+		return NULL;
+
+	idx = vring->next_avail % vr->num;
+	head = virtio16_to_cpu(vdev, vr->avail->ring[idx]);
+	if (WARN_ON(head >= vr->num))
+		return NULL;
+
+	vring->next_avail++;
+
+	return &vr->desc[head];
+}
+
+/* Release virtio descriptor. */
+static void mlxbf_tmfifo_release_desc(struct mlxbf_tmfifo_vring *vring,
+				      struct vring_desc *desc, u32 len)
+{
+	const struct vring *vr = virtqueue_get_vring(vring->vq);
+	struct virtio_device *vdev = vring->vq->vdev;
+	u16 idx, vr_idx;
+
+	vr_idx = virtio16_to_cpu(vdev, vr->used->idx);
+	idx = vr_idx % vr->num;
+	vr->used->ring[idx].id = cpu_to_virtio32(vdev, desc - vr->desc);
+	vr->used->ring[idx].len = cpu_to_virtio32(vdev, len);
+
+	/*
+	 * Virtio could poll and check the 'idx' to decide whether the desc is
+	 * done or not. Add a memory barrier here to make sure the update above
+	 * completes before updating the idx.
+	 */
+	mb();
+	vr->used->idx = cpu_to_virtio16(vdev, vr_idx + 1);
+}
+
+/* Get the total length of the descriptor chain. */
+static u32 mlxbf_tmfifo_get_pkt_len(struct mlxbf_tmfifo_vring *vring,
+				    struct vring_desc *desc)
+{
+	const struct vring *vr = virtqueue_get_vring(vring->vq);
+	struct virtio_device *vdev = vring->vq->vdev;
+	u32 len = 0, idx;
+
+	while (desc) {
+		len += virtio32_to_cpu(vdev, desc->len);
+		if (!(virtio16_to_cpu(vdev, desc->flags) & VRING_DESC_F_NEXT))
+			break;
+		idx = virtio16_to_cpu(vdev, desc->next);
+		desc = &vr->desc[idx];
+	}
+
+	return len;
+}
+
+static void mlxbf_tmfifo_release_pending_pkt(struct mlxbf_tmfifo_vring *vring)
+{
+	struct vring_desc *desc_head;
+	u32 len = 0;
+
+	if (vring->desc_head) {
+		desc_head = vring->desc_head;
+		len = vring->pkt_len;
+	} else {
+		desc_head = mlxbf_tmfifo_get_next_desc(vring);
+		if (desc_head)
+			len = mlxbf_tmfifo_get_pkt_len(vring, desc_head);
+	}
+
+	if (desc_head)
+		mlxbf_tmfifo_release_desc(vring, desc_head, len);
+
+	vring->pkt_len = 0;
+	vring->desc = NULL;
+	vring->desc_head = NULL;
+}
+
+static void mlxbf_tmfifo_init_net_desc(struct mlxbf_tmfifo_vring *vring,
+				       struct vring_desc *desc, bool is_rx)
+{
+	struct virtio_device *vdev = vring->vq->vdev;
+	struct virtio_net_hdr *net_hdr;
+
+	net_hdr = phys_to_virt(virtio64_to_cpu(vdev, desc->addr));
+	memset(net_hdr, 0, sizeof(*net_hdr));
+}
+
+/* Get and initialize the next packet. */
+static struct vring_desc *
+mlxbf_tmfifo_get_next_pkt(struct mlxbf_tmfifo_vring *vring, bool is_rx)
+{
+	struct vring_desc *desc;
+
+	desc = mlxbf_tmfifo_get_next_desc(vring);
+	if (desc && is_rx && vring->vdev_id == VIRTIO_ID_NET)
+		mlxbf_tmfifo_init_net_desc(vring, desc, is_rx);
+
+	vring->desc_head = desc;
+	vring->desc = desc;
+
+	return desc;
+}
+
+/* House-keeping timer. */
+static void mlxbf_tmfifo_timer(struct timer_list *arg)
+{
+	struct mlxbf_tmfifo *fifo = container_of(arg, struct mlxbf_tmfifo,
+						 timer);
+	int more;
+
+	more = !test_and_set_bit(MLXBF_TM_RX_HWM_IRQ, &fifo->pend_events) ||
+		    !test_and_set_bit(MLXBF_TM_TX_LWM_IRQ, &fifo->pend_events);
+
+	if (more)
+		schedule_work(&fifo->work);
+
+	mod_timer(&fifo->timer, jiffies + MLXBF_TMFIFO_TIMER_INTERVAL);
+}
+
+/* Copy one console packet into the output buffer. */
+static void mlxbf_tmfifo_console_output_one(struct mlxbf_tmfifo_vdev *cons,
+					    struct mlxbf_tmfifo_vring *vring,
+					    struct vring_desc *desc)
+{
+	const struct vring *vr = virtqueue_get_vring(vring->vq);
+	struct virtio_device *vdev = &cons->vdev;
+	u32 len, idx, seg;
+	void *addr;
+
+	while (desc) {
+		addr = phys_to_virt(virtio64_to_cpu(vdev, desc->addr));
+		len = virtio32_to_cpu(vdev, desc->len);
+
+		seg = CIRC_SPACE_TO_END(cons->tx_buf.head, cons->tx_buf.tail,
+					MLXBF_TMFIFO_CON_TX_BUF_SIZE);
+		if (len <= seg) {
+			memcpy(cons->tx_buf.buf + cons->tx_buf.head, addr, len);
+		} else {
+			memcpy(cons->tx_buf.buf + cons->tx_buf.head, addr, seg);
+			addr += seg;
+			memcpy(cons->tx_buf.buf, addr, len - seg);
+		}
+		cons->tx_buf.head = (cons->tx_buf.head + len) %
+			MLXBF_TMFIFO_CON_TX_BUF_SIZE;
+
+		if (!(virtio16_to_cpu(vdev, desc->flags) & VRING_DESC_F_NEXT))
+			break;
+		idx = virtio16_to_cpu(vdev, desc->next);
+		desc = &vr->desc[idx];
+	}
+}
+
+/* Copy console data into the output buffer. */
+static void mlxbf_tmfifo_console_output(struct mlxbf_tmfifo_vdev *cons,
+					struct mlxbf_tmfifo_vring *vring)
+{
+	struct vring_desc *desc;
+	u32 len, avail;
+
+	desc = mlxbf_tmfifo_get_next_desc(vring);
+	while (desc) {
+		/* Release the packet if not enough space. */
+		len = mlxbf_tmfifo_get_pkt_len(vring, desc);
+		avail = CIRC_SPACE(cons->tx_buf.head, cons->tx_buf.tail,
+				   MLXBF_TMFIFO_CON_TX_BUF_SIZE);
+		if (len + MLXBF_TMFIFO_CON_TX_BUF_RSV_SIZE > avail) {
+			mlxbf_tmfifo_release_desc(vring, desc, len);
+			break;
+		}
+
+		mlxbf_tmfifo_console_output_one(cons, vring, desc);
+		mlxbf_tmfifo_release_desc(vring, desc, len);
+		desc = mlxbf_tmfifo_get_next_desc(vring);
+	}
+}
+
+/* Get the number of available words in Rx FIFO for receiving. */
+static int mlxbf_tmfifo_get_rx_avail(struct mlxbf_tmfifo *fifo)
+{
+	u64 sts;
+
+	sts = readq(fifo->rx_base + MLXBF_TMFIFO_RX_STS);
+	return FIELD_GET(MLXBF_TMFIFO_RX_STS__COUNT_MASK, sts);
+}
+
+/* Get the number of available words in the TmFifo for sending. */
+static int mlxbf_tmfifo_get_tx_avail(struct mlxbf_tmfifo *fifo, int vdev_id)
+{
+	int tx_reserve;
+	u32 count;
+	u64 sts;
+
+	/* Reserve some room in FIFO for console messages. */
+	if (vdev_id == VIRTIO_ID_NET)
+		tx_reserve = fifo->tx_fifo_size / MLXBF_TMFIFO_RESERVE_RATIO;
+	else
+		tx_reserve = 1;
+
+	sts = readq(fifo->tx_base + MLXBF_TMFIFO_TX_STS);
+	count = FIELD_GET(MLXBF_TMFIFO_TX_STS__COUNT_MASK, sts);
+	return fifo->tx_fifo_size - tx_reserve - count;
+}
+
+/* Console Tx (move data from the output buffer into the TmFifo). */
+static void mlxbf_tmfifo_console_tx(struct mlxbf_tmfifo *fifo, int avail)
+{
+	union mlxbf_tmfifo_msg_hdr hdr;
+	struct mlxbf_tmfifo_vdev *cons;
+	unsigned long flags;
+	int size, seg;
+	void *addr;
+	u64 data;
+
+	/* Return if not enough space available. */
+	if (avail < MLXBF_TMFIFO_DATA_MIN_WORDS)
+		return;
+
+	cons = fifo->vdev[VIRTIO_ID_CONSOLE];
+	if (!cons || !cons->tx_buf.buf)
+		return;
+
+	/* Return if no data to send. */
+	size = CIRC_CNT(cons->tx_buf.head, cons->tx_buf.tail,
+			MLXBF_TMFIFO_CON_TX_BUF_SIZE);
+	if (size == 0)
+		return;
+
+	/* Adjust the size to available space. */
+	if (size + sizeof(hdr) > avail * sizeof(u64))
+		size = avail * sizeof(u64) - sizeof(hdr);
+
+	/* Write header. */
+	hdr.data = 0;
+	hdr.type = VIRTIO_ID_CONSOLE;
+	hdr.len = htons(size);
+	writeq(hdr.data, fifo->tx_base + MLXBF_TMFIFO_TX_DATA);
+
+	/* Use spin-lock to protect the 'cons->tx_buf'. */
+	spin_lock_irqsave(&fifo->spin_lock, flags);
+
+	while (size > 0) {
+		addr = cons->tx_buf.buf + cons->tx_buf.tail;
+
+		seg = CIRC_CNT_TO_END(cons->tx_buf.head, cons->tx_buf.tail,
+				      MLXBF_TMFIFO_CON_TX_BUF_SIZE);
+		if (seg >= sizeof(u64)) {
+			memcpy(&data, addr, sizeof(u64));
+		} else {
+			memcpy(&data, addr, seg);
+			memcpy((u8 *)&data + seg, cons->tx_buf.buf,
+			       sizeof(u64) - seg);
+		}
+		writeq(data, fifo->tx_base + MLXBF_TMFIFO_TX_DATA);
+
+		if (size >= sizeof(u64)) {
+			cons->tx_buf.tail = (cons->tx_buf.tail + sizeof(u64)) %
+				MLXBF_TMFIFO_CON_TX_BUF_SIZE;
+			size -= sizeof(u64);
+		} else {
+			cons->tx_buf.tail = (cons->tx_buf.tail + size) %
+				MLXBF_TMFIFO_CON_TX_BUF_SIZE;
+			size = 0;
+		}
+	}
+
+	spin_unlock_irqrestore(&fifo->spin_lock, flags);
+}
+
+/* Rx/Tx one word in the descriptor buffer. */
+static void mlxbf_tmfifo_rxtx_word(struct mlxbf_tmfifo_vring *vring,
+				   struct vring_desc *desc,
+				   bool is_rx, int len)
+{
+	struct virtio_device *vdev = vring->vq->vdev;
+	struct mlxbf_tmfifo *fifo = vring->fifo;
+	void *addr;
+	u64 data;
+
+	/* Get the buffer address of this desc. */
+	addr = phys_to_virt(virtio64_to_cpu(vdev, desc->addr));
+
+	/* Read a word from FIFO for Rx. */
+	if (is_rx)
+		data = readq(fifo->rx_base + MLXBF_TMFIFO_RX_DATA);
+
+	if (vring->cur_len + sizeof(u64) <= len) {
+		/* The whole word. */
+		if (is_rx)
+			memcpy(addr + vring->cur_len, &data, sizeof(u64));
+		else
+			memcpy(&data, addr + vring->cur_len, sizeof(u64));
+		vring->cur_len += sizeof(u64);
+	} else {
+		/* Leftover bytes. */
+		if (is_rx)
+			memcpy(addr + vring->cur_len, &data,
+			       len - vring->cur_len);
+		else
+			memcpy(&data, addr + vring->cur_len,
+			       len - vring->cur_len);
+		vring->cur_len = len;
+	}
+
+	/* Write the word into FIFO for Tx. */
+	if (!is_rx)
+		writeq(data, fifo->tx_base + MLXBF_TMFIFO_TX_DATA);
+}
+
+/*
+ * Rx/Tx packet header.
+ *
+ * In Rx case, the packet might be found to belong to a different vring since
+ * the TmFifo is shared by different services. In such case, the 'vring_change'
+ * flag is set.
+ */
+static void mlxbf_tmfifo_rxtx_header(struct mlxbf_tmfifo_vring *vring,
+				     struct vring_desc *desc,
+				     bool is_rx, bool *vring_change)
+{
+	struct mlxbf_tmfifo *fifo = vring->fifo;
+	struct virtio_net_config *config;
+	union mlxbf_tmfifo_msg_hdr hdr;
+	int vdev_id, hdr_len;
+
+	/* Read/Write packet header. */
+	if (is_rx) {
+		/* Drain one word from the FIFO. */
+		hdr.data = readq(fifo->rx_base + MLXBF_TMFIFO_RX_DATA);
+
+		/* Skip the length 0 packets (keepalive). */
+		if (hdr.len == 0)
+			return;
+
+		/* Check packet type. */
+		if (hdr.type == VIRTIO_ID_NET) {
+			vdev_id = VIRTIO_ID_NET;
+			hdr_len = sizeof(struct virtio_net_hdr);
+			config = &fifo->vdev[vdev_id]->config.net;
+			if (ntohs(hdr.len) > config->mtu +
+			    MLXBF_TMFIFO_NET_L2_OVERHEAD)
+				return;
+		} else {
+			vdev_id = VIRTIO_ID_CONSOLE;
+			hdr_len = 0;
+		}
+
+		/*
+		 * Check whether the new packet still belongs to this vring.
+		 * If not, update the pkt_len of the new vring.
+		 */
+		if (vdev_id != vring->vdev_id) {
+			struct mlxbf_tmfifo_vdev *tm_dev2 = fifo->vdev[vdev_id];
+
+			if (!tm_dev2)
+				return;
+			vring->desc = desc;
+			vring = &tm_dev2->vrings[MLXBF_TMFIFO_VRING_RX];
+			*vring_change = true;
+		}
+		vring->pkt_len = ntohs(hdr.len) + hdr_len;
+	} else {
+		/* Network virtio has an extra header. */
+		hdr_len = (vring->vdev_id == VIRTIO_ID_NET) ?
+			   sizeof(struct virtio_net_hdr) : 0;
+		vring->pkt_len = mlxbf_tmfifo_get_pkt_len(vring, desc);
+		hdr.data = 0;
+		hdr.type = (vring->vdev_id == VIRTIO_ID_NET) ?
+			    VIRTIO_ID_NET : VIRTIO_ID_CONSOLE;
+		hdr.len = htons(vring->pkt_len - hdr_len);
+		writeq(hdr.data, fifo->tx_base + MLXBF_TMFIFO_TX_DATA);
+	}
+
+	vring->cur_len = hdr_len;
+	vring->rem_len = vring->pkt_len;
+	fifo->vring[is_rx] = vring;
+}
+
+/*
+ * Rx/Tx one descriptor.
+ *
+ * Return true to indicate more data available.
+ */
+static bool mlxbf_tmfifo_rxtx_one_desc(struct mlxbf_tmfifo_vring *vring,
+				       bool is_rx, int *avail)
+{
+	const struct vring *vr = virtqueue_get_vring(vring->vq);
+	struct mlxbf_tmfifo *fifo = vring->fifo;
+	struct virtio_device *vdev;
+	bool vring_change = false;
+	struct vring_desc *desc;
+	unsigned long flags;
+	u32 len, idx;
+
+	vdev = &fifo->vdev[vring->vdev_id]->vdev;
+
+	/* Get the descriptor of the next packet. */
+	if (!vring->desc) {
+		desc = mlxbf_tmfifo_get_next_pkt(vring, is_rx);
+		if (!desc)
+			return false;
+	} else {
+		desc = vring->desc;
+	}
+
+	/* Beginning of a packet. Start to Rx/Tx packet header. */
+	if (vring->pkt_len == 0) {
+		mlxbf_tmfifo_rxtx_header(vring, desc, is_rx, &vring_change);
+		(*avail)--;
+
+		/* Return if new packet is for another ring. */
+		if (vring_change)
+			return false;
+		goto mlxbf_tmfifo_desc_done;
+	}
+
+	/* Get the length of this desc. */
+	len = virtio32_to_cpu(vdev, desc->len);
+	if (len > vring->rem_len)
+		len = vring->rem_len;
+
+	/* Rx/Tx one word (8 bytes) if not done. */
+	if (vring->cur_len < len) {
+		mlxbf_tmfifo_rxtx_word(vring, desc, is_rx, len);
+		(*avail)--;
+	}
+
+	/* Check again whether it's done. */
+	if (vring->cur_len == len) {
+		vring->cur_len = 0;
+		vring->rem_len -= len;
+
+		/* Get the next desc on the chain. */
+		if (vring->rem_len > 0 &&
+		    (virtio16_to_cpu(vdev, desc->flags) & VRING_DESC_F_NEXT)) {
+			idx = virtio16_to_cpu(vdev, desc->next);
+			desc = &vr->desc[idx];
+			goto mlxbf_tmfifo_desc_done;
+		}
+
+		/* Done and release the pending packet. */
+		mlxbf_tmfifo_release_pending_pkt(vring);
+		desc = NULL;
+		fifo->vring[is_rx] = NULL;
+
+		/* Notify upper layer that packet is done. */
+		spin_lock_irqsave(&fifo->spin_lock, flags);
+		vring_interrupt(0, vring->vq);
+		spin_unlock_irqrestore(&fifo->spin_lock, flags);
+	}
+
+mlxbf_tmfifo_desc_done:
+	/* Save the current desc. */
+	vring->desc = desc;
+
+	return true;
+}
+
+/* Rx & Tx processing of a queue. */
+static void mlxbf_tmfifo_rxtx(struct mlxbf_tmfifo_vring *vring, bool is_rx)
+{
+	int avail = 0, devid = vring->vdev_id;
+	struct mlxbf_tmfifo *fifo;
+	bool more;
+
+	fifo = vring->fifo;
+
+	/* Return if vdev is not ready. */
+	if (!fifo->vdev[devid])
+		return;
+
+	/* Return if another vring is running. */
+	if (fifo->vring[is_rx] && fifo->vring[is_rx] != vring)
+		return;
+
+	/* Only handle console and network for now. */
+	if (WARN_ON(devid != VIRTIO_ID_NET && devid != VIRTIO_ID_CONSOLE))
+		return;
+
+	do {
+		/* Get available FIFO space. */
+		if (avail == 0) {
+			if (is_rx)
+				avail = mlxbf_tmfifo_get_rx_avail(fifo);
+			else
+				avail = mlxbf_tmfifo_get_tx_avail(fifo, devid);
+			if (avail <= 0)
+				break;
+		}
+
+		/* Console output always comes from the Tx buffer. */
+		if (!is_rx && devid == VIRTIO_ID_CONSOLE) {
+			mlxbf_tmfifo_console_tx(fifo, avail);
+			break;
+		}
+
+		/* Handle one descriptor. */
+		more = mlxbf_tmfifo_rxtx_one_desc(vring, is_rx, &avail);
+	} while (more);
+}
+
+/* Handle Rx or Tx queues. */
+static void mlxbf_tmfifo_work_rxtx(struct mlxbf_tmfifo *fifo, int queue_id,
+				   int irq_id, bool is_rx)
+{
+	struct mlxbf_tmfifo_vdev *tm_vdev;
+	struct mlxbf_tmfifo_vring *vring;
+	int i;
+
+	if (!test_and_clear_bit(irq_id, &fifo->pend_events) ||
+	    !fifo->irq_info[irq_id].irq)
+		return;
+
+	for (i = 0; i < MLXBF_TMFIFO_VDEV_MAX; i++) {
+		tm_vdev = fifo->vdev[i];
+		if (tm_vdev) {
+			vring = &tm_vdev->vrings[queue_id];
+			if (vring->vq)
+				mlxbf_tmfifo_rxtx(vring, is_rx);
+		}
+	}
+}
+
+/* Work handler for Rx and Tx case. */
+static void mlxbf_tmfifo_work_handler(struct work_struct *work)
+{
+	struct mlxbf_tmfifo *fifo;
+
+	fifo = container_of(work, struct mlxbf_tmfifo, work);
+	if (!fifo->is_ready)
+		return;
+
+	mutex_lock(&fifo->lock);
+
+	/* Tx (Send data to the TmFifo). */
+	mlxbf_tmfifo_work_rxtx(fifo, MLXBF_TMFIFO_VRING_TX,
+			       MLXBF_TM_TX_LWM_IRQ, false);
+
+	/* Rx (Receive data from the TmFifo). */
+	mlxbf_tmfifo_work_rxtx(fifo, MLXBF_TMFIFO_VRING_RX,
+			       MLXBF_TM_RX_HWM_IRQ, true);
+
+	mutex_unlock(&fifo->lock);
+}
+
+/* The notify function is called when new buffers are posted. */
+static bool mlxbf_tmfifo_virtio_notify(struct virtqueue *vq)
+{
+	struct mlxbf_tmfifo_vring *vring = vq->priv;
+	struct mlxbf_tmfifo_vdev *tm_vdev;
+	struct mlxbf_tmfifo *fifo;
+	unsigned long flags;
+
+	fifo = vring->fifo;
+
+	/*
+	 * Virtio maintains vrings in pairs, even number ring for Rx
+	 * and odd number ring for Tx.
+	 */
+	if (!(vring->index & BIT(0))) {
+		if (test_and_set_bit(MLXBF_TM_RX_HWM_IRQ, &fifo->pend_events))
+			return true;
+	} else {
+		/*
+		 * Console could make blocking call with interrupts disabled.
+		 * In such case, the vring needs to be served right away. For
+		 * other cases, just set the TX LWM bit to start Tx in the
+		 * worker handler.
+		 */
+		if (vring->vdev_id == VIRTIO_ID_CONSOLE) {
+			spin_lock_irqsave(&fifo->spin_lock, flags);
+			tm_vdev = fifo->vdev[VIRTIO_ID_CONSOLE];
+			mlxbf_tmfifo_console_output(tm_vdev, vring);
+			spin_unlock_irqrestore(&fifo->spin_lock, flags);
+		} else if (test_and_set_bit(MLXBF_TM_TX_LWM_IRQ,
+					    &fifo->pend_events)) {
+			return true;
+		}
+	}
+
+	schedule_work(&fifo->work);
+
+	return true;
+}
+
+/* Get the array of feature bits for this device. */
+static u64 mlxbf_tmfifo_virtio_get_features(struct virtio_device *vdev)
+{
+	struct mlxbf_tmfifo_vdev *tm_vdev = mlxbf_vdev_to_tmfifo(vdev);
+
+	return tm_vdev->features;
+}
+
+/* Confirm device features to use. */
+static int mlxbf_tmfifo_virtio_finalize_features(struct virtio_device *vdev)
+{
+	struct mlxbf_tmfifo_vdev *tm_vdev = mlxbf_vdev_to_tmfifo(vdev);
+
+	tm_vdev->features = vdev->features;
+
+	return 0;
+}
+
+/* Free virtqueues found by find_vqs(). */
+static void mlxbf_tmfifo_virtio_del_vqs(struct virtio_device *vdev)
+{
+	struct mlxbf_tmfifo_vdev *tm_vdev = mlxbf_vdev_to_tmfifo(vdev);
+	struct mlxbf_tmfifo_vring *vring;
+	struct virtqueue *vq;
+	int i;
+
+	for (i = 0; i < ARRAY_SIZE(tm_vdev->vrings); i++) {
+		vring = &tm_vdev->vrings[i];
+
+		/* Release the pending packet. */
+		if (vring->desc)
+			mlxbf_tmfifo_release_pending_pkt(vring);
+		vq = vring->vq;
+		if (vq) {
+			vring->vq = NULL;
+			vring_del_virtqueue(vq);
+		}
+	}
+}
+
+/* Create and initialize the virtual queues. */
+static int mlxbf_tmfifo_virtio_find_vqs(struct virtio_device *vdev,
+					unsigned int nvqs,
+					struct virtqueue *vqs[],
+					vq_callback_t *callbacks[],
+					const char * const names[],
+					const bool *ctx,
+					struct irq_affinity *desc)
+{
+	struct mlxbf_tmfifo_vdev *tm_vdev = mlxbf_vdev_to_tmfifo(vdev);
+	struct mlxbf_tmfifo_vring *vring;
+	struct virtqueue *vq;
+	int i, ret, size;
+
+	if (nvqs > ARRAY_SIZE(tm_vdev->vrings))
+		return -EINVAL;
+
+	for (i = 0; i < nvqs; ++i) {
+		if (!names[i]) {
+			ret = -EINVAL;
+			goto error;
+		}
+		vring = &tm_vdev->vrings[i];
+
+		/* zero vring */
+		size = vring_size(vring->num, vring->align);
+		memset(vring->va, 0, size);
+		vq = vring_new_virtqueue(i, vring->num, vring->align, vdev,
+					 false, false, vring->va,
+					 mlxbf_tmfifo_virtio_notify,
+					 callbacks[i], names[i]);
+		if (!vq) {
+			dev_err(&vdev->dev, "vring_new_virtqueue failed\n");
+			ret = -ENOMEM;
+			goto error;
+		}
+
+		vqs[i] = vq;
+		vring->vq = vq;
+		vq->priv = vring;
+	}
+
+	return 0;
+
+error:
+	mlxbf_tmfifo_virtio_del_vqs(vdev);
+	return ret;
+}
+
+/* Read the status byte. */
+static u8 mlxbf_tmfifo_virtio_get_status(struct virtio_device *vdev)
+{
+	struct mlxbf_tmfifo_vdev *tm_vdev = mlxbf_vdev_to_tmfifo(vdev);
+
+	return tm_vdev->status;
+}
+
+/* Write the status byte. */
+static void mlxbf_tmfifo_virtio_set_status(struct virtio_device *vdev,
+					   u8 status)
+{
+	struct mlxbf_tmfifo_vdev *tm_vdev = mlxbf_vdev_to_tmfifo(vdev);
+
+	tm_vdev->status = status;
+}
+
+/* Reset the device. Not much here for now. */
+static void mlxbf_tmfifo_virtio_reset(struct virtio_device *vdev)
+{
+	struct mlxbf_tmfifo_vdev *tm_vdev = mlxbf_vdev_to_tmfifo(vdev);
+
+	tm_vdev->status = 0;
+}
+
+/* Read the value of a configuration field. */
+static void mlxbf_tmfifo_virtio_get(struct virtio_device *vdev,
+				    unsigned int offset,
+				    void *buf,
+				    unsigned int len)
+{
+	struct mlxbf_tmfifo_vdev *tm_vdev = mlxbf_vdev_to_tmfifo(vdev);
+
+	if (offset + len > sizeof(tm_vdev->config))
+		return;
+
+	memcpy(buf, (u8 *)&tm_vdev->config + offset, len);
+}
+
+/* Write the value of a configuration field. */
+static void mlxbf_tmfifo_virtio_set(struct virtio_device *vdev,
+				    unsigned int offset,
+				    const void *buf,
+				    unsigned int len)
+{
+	struct mlxbf_tmfifo_vdev *tm_vdev = mlxbf_vdev_to_tmfifo(vdev);
+
+	if (offset + len > sizeof(tm_vdev->config))
+		return;
+
+	memcpy((u8 *)&tm_vdev->config + offset, buf, len);
+}
+
+static void tmfifo_virtio_dev_release(struct device *device)
+{
+	struct virtio_device *vdev =
+			container_of(device, struct virtio_device, dev);
+	struct mlxbf_tmfifo_vdev *tm_vdev = mlxbf_vdev_to_tmfifo(vdev);
+
+	kfree(tm_vdev);
+}
+
+/* Virtio config operations. */
+static const struct virtio_config_ops mlxbf_tmfifo_virtio_config_ops = {
+	.get_features = mlxbf_tmfifo_virtio_get_features,
+	.finalize_features = mlxbf_tmfifo_virtio_finalize_features,
+	.find_vqs = mlxbf_tmfifo_virtio_find_vqs,
+	.del_vqs = mlxbf_tmfifo_virtio_del_vqs,
+	.reset = mlxbf_tmfifo_virtio_reset,
+	.set_status = mlxbf_tmfifo_virtio_set_status,
+	.get_status = mlxbf_tmfifo_virtio_get_status,
+	.get = mlxbf_tmfifo_virtio_get,
+	.set = mlxbf_tmfifo_virtio_set,
+};
+
+/* Create vdev type in a tmfifo. */
+static int mlxbf_tmfifo_create_vdev(struct device *dev,
+				    struct mlxbf_tmfifo *fifo,
+				    int vdev_id, u64 features,
+				    void *config, u32 size)
+{
+	struct mlxbf_tmfifo_vdev *tm_vdev, *reg_dev = NULL;
+	int ret;
+
+	mutex_lock(&fifo->lock);
+
+	tm_vdev = fifo->vdev[vdev_id];
+	if (tm_vdev) {
+		dev_err(dev, "vdev %d already exists\n", vdev_id);
+		ret = -EEXIST;
+		goto fail;
+	}
+
+	tm_vdev = kzalloc(sizeof(*tm_vdev), GFP_KERNEL);
+	if (!tm_vdev) {
+		ret = -ENOMEM;
+		goto fail;
+	}
+
+	tm_vdev->vdev.id.device = vdev_id;
+	tm_vdev->vdev.config = &mlxbf_tmfifo_virtio_config_ops;
+	tm_vdev->vdev.dev.parent = &fifo->pdev->dev;
+	tm_vdev->vdev.dev.release = tmfifo_virtio_dev_release;
+	tm_vdev->features = features;
+	if (config)
+		memcpy(&tm_vdev->config, config, size);
+
+	if (mlxbf_tmfifo_alloc_vrings(fifo, tm_vdev)) {
+		dev_err(dev, "unable to allocate vring\n");
+		ret = -ENOMEM;
+		goto vdev_fail;
+	}
+
+	/* Allocate an output buffer for the console device. */
+	if (vdev_id == VIRTIO_ID_CONSOLE)
+		tm_vdev->tx_buf.buf = devm_kmalloc(dev,
+						   MLXBF_TMFIFO_CON_TX_BUF_SIZE,
+						   GFP_KERNEL);
+	fifo->vdev[vdev_id] = tm_vdev;
+
+	/* Register the virtio device. */
+	ret = register_virtio_device(&tm_vdev->vdev);
+	reg_dev = tm_vdev;
+	if (ret) {
+		dev_err(&fifo->pdev->dev, "register_virtio_device failed\n");
+		goto vdev_fail;
+	}
+
+	mutex_unlock(&fifo->lock);
+	return 0;
+
+vdev_fail:
+	mlxbf_tmfifo_free_vrings(fifo, tm_vdev);
+	fifo->vdev[vdev_id] = NULL;
+	if (reg_dev)
+		put_device(&tm_vdev->vdev.dev);
+	else
+		kfree(tm_vdev);
+fail:
+	mutex_unlock(&fifo->lock);
+	return ret;
+}
+
+/* Delete vdev type from a tmfifo. */
+static int mlxbf_tmfifo_delete_vdev(struct mlxbf_tmfifo *fifo, int vdev_id)
+{
+	struct mlxbf_tmfifo_vdev *tm_vdev;
+
+	mutex_lock(&fifo->lock);
+
+	/* Unregister vdev. */
+	tm_vdev = fifo->vdev[vdev_id];
+	if (tm_vdev) {
+		unregister_virtio_device(&tm_vdev->vdev);
+		mlxbf_tmfifo_free_vrings(fifo, tm_vdev);
+		fifo->vdev[vdev_id] = NULL;
+	}
+
+	mutex_unlock(&fifo->lock);
+
+	return 0;
+}
+
+/* Read the configured network MAC address from efi variable. */
+static void mlxbf_tmfifo_get_cfg_mac(u8 *mac)
+{
+	efi_guid_t guid = EFI_GLOBAL_VARIABLE_GUID;
+	unsigned long size = ETH_ALEN;
+	efi_status_t status;
+	u8 buf[ETH_ALEN];
+
+	status = efi.get_variable(mlxbf_tmfifo_efi_name, &guid, NULL, &size,
+				  buf);
+	if (status == EFI_SUCCESS && size == ETH_ALEN)
+		ether_addr_copy(mac, buf);
+	else
+		memcpy(mac, mlxbf_tmfifo_net_default_mac, ETH_ALEN);
+}
+
+/* Set TmFifo thresolds which is used to trigger interrupts. */
+static void mlxbf_tmfifo_set_threshold(struct mlxbf_tmfifo *fifo)
+{
+	u64 ctl;
+
+	/* Get Tx FIFO size and set the low/high watermark. */
+	ctl = readq(fifo->tx_base + MLXBF_TMFIFO_TX_CTL);
+	fifo->tx_fifo_size =
+		FIELD_GET(MLXBF_TMFIFO_TX_CTL__MAX_ENTRIES_MASK, ctl);
+	ctl = (ctl & ~MLXBF_TMFIFO_TX_CTL__LWM_MASK) |
+		FIELD_PREP(MLXBF_TMFIFO_TX_CTL__LWM_MASK,
+			   fifo->tx_fifo_size / 2);
+	ctl = (ctl & ~MLXBF_TMFIFO_TX_CTL__HWM_MASK) |
+		FIELD_PREP(MLXBF_TMFIFO_TX_CTL__HWM_MASK,
+			   fifo->tx_fifo_size - 1);
+	writeq(ctl, fifo->tx_base + MLXBF_TMFIFO_TX_CTL);
+
+	/* Get Rx FIFO size and set the low/high watermark. */
+	ctl = readq(fifo->rx_base + MLXBF_TMFIFO_RX_CTL);
+	fifo->rx_fifo_size =
+		FIELD_GET(MLXBF_TMFIFO_RX_CTL__MAX_ENTRIES_MASK, ctl);
+	ctl = (ctl & ~MLXBF_TMFIFO_RX_CTL__LWM_MASK) |
+		FIELD_PREP(MLXBF_TMFIFO_RX_CTL__LWM_MASK, 0);
+	ctl = (ctl & ~MLXBF_TMFIFO_RX_CTL__HWM_MASK) |
+		FIELD_PREP(MLXBF_TMFIFO_RX_CTL__HWM_MASK, 1);
+	writeq(ctl, fifo->rx_base + MLXBF_TMFIFO_RX_CTL);
+}
+
+static void mlxbf_tmfifo_cleanup(struct mlxbf_tmfifo *fifo)
+{
+	int i;
+
+	fifo->is_ready = false;
+	del_timer_sync(&fifo->timer);
+	mlxbf_tmfifo_disable_irqs(fifo);
+	cancel_work_sync(&fifo->work);
+	for (i = 0; i < MLXBF_TMFIFO_VDEV_MAX; i++)
+		mlxbf_tmfifo_delete_vdev(fifo, i);
+}
+
+/* Probe the TMFIFO. */
+static int mlxbf_tmfifo_probe(struct platform_device *pdev)
+{
+	struct virtio_net_config net_config;
+	struct mlxbf_tmfifo *fifo;
+	struct resource *res;
+	int i, ret;
+
+	fifo = devm_kzalloc(&pdev->dev, sizeof(*fifo), GFP_KERNEL);
+	if (!fifo)
+		return -ENOMEM;
+
+	spin_lock_init(&fifo->spin_lock);
+	INIT_WORK(&fifo->work, mlxbf_tmfifo_work_handler);
+	mutex_init(&fifo->lock);
+
+	/* Get the resource of the Rx FIFO. */
+	res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
+	fifo->rx_base = devm_ioremap_resource(&pdev->dev, res);
+	if (IS_ERR(fifo->rx_base))
+		return PTR_ERR(fifo->rx_base);
+
+	/* Get the resource of the Tx FIFO. */
+	res = platform_get_resource(pdev, IORESOURCE_MEM, 1);
+	fifo->tx_base = devm_ioremap_resource(&pdev->dev, res);
+	if (IS_ERR(fifo->tx_base))
+		return PTR_ERR(fifo->tx_base);
+
+	fifo->pdev = pdev;
+	platform_set_drvdata(pdev, fifo);
+
+	timer_setup(&fifo->timer, mlxbf_tmfifo_timer, 0);
+
+	for (i = 0; i < MLXBF_TM_MAX_IRQ; i++) {
+		fifo->irq_info[i].index = i;
+		fifo->irq_info[i].fifo = fifo;
+		fifo->irq_info[i].irq = platform_get_irq(pdev, i);
+		ret = devm_request_irq(&pdev->dev, fifo->irq_info[i].irq,
+				       mlxbf_tmfifo_irq_handler, 0,
+				       "tmfifo", &fifo->irq_info[i]);
+		if (ret) {
+			dev_err(&pdev->dev, "devm_request_irq failed\n");
+			fifo->irq_info[i].irq = 0;
+			return ret;
+		}
+	}
+
+	mlxbf_tmfifo_set_threshold(fifo);
+
+	/* Create the console vdev. */
+	ret = mlxbf_tmfifo_create_vdev(&pdev->dev, fifo, VIRTIO_ID_CONSOLE, 0,
+				       NULL, 0);
+	if (ret)
+		goto fail;
+
+	/* Create the network vdev. */
+	memset(&net_config, 0, sizeof(net_config));
+	net_config.mtu = ETH_DATA_LEN;
+	net_config.status = VIRTIO_NET_S_LINK_UP;
+	mlxbf_tmfifo_get_cfg_mac(net_config.mac);
+	ret = mlxbf_tmfifo_create_vdev(&pdev->dev, fifo, VIRTIO_ID_NET,
+				       MLXBF_TMFIFO_NET_FEATURES, &net_config,
+				       sizeof(net_config));
+	if (ret)
+		goto fail;
+
+	mod_timer(&fifo->timer, jiffies + MLXBF_TMFIFO_TIMER_INTERVAL);
+
+	fifo->is_ready = true;
+	return 0;
+
+fail:
+	mlxbf_tmfifo_cleanup(fifo);
+	return ret;
+}
+
+/* Device remove function. */
+static int mlxbf_tmfifo_remove(struct platform_device *pdev)
+{
+	struct mlxbf_tmfifo *fifo = platform_get_drvdata(pdev);
+
+	mlxbf_tmfifo_cleanup(fifo);
+
+	return 0;
+}
+
+static const struct acpi_device_id mlxbf_tmfifo_acpi_match[] = {
+	{ "MLNXBF01", 0 },
+	{}
+};
+MODULE_DEVICE_TABLE(acpi, mlxbf_tmfifo_acpi_match);
+
+static struct platform_driver mlxbf_tmfifo_driver = {
+	.probe = mlxbf_tmfifo_probe,
+	.remove = mlxbf_tmfifo_remove,
+	.driver = {
+		.name = "bf-tmfifo",
+		.acpi_match_table = mlxbf_tmfifo_acpi_match,
+	},
+};
+
+module_platform_driver(mlxbf_tmfifo_driver);
+
+MODULE_DESCRIPTION("Mellanox BlueField SoC TmFifo Driver");
+MODULE_LICENSE("GPL v2");
+MODULE_AUTHOR("Mellanox Technologies");
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [PATCH v13] platform/mellanox: Add TmFifo driver for Mellanox BlueField Soc
  2019-04-04 19:36 ` [PATCH v13] " Liming Sun
@ 2019-04-05 15:44   ` Andy Shevchenko
  2019-04-05 19:10     ` Liming Sun
  0 siblings, 1 reply; 30+ messages in thread
From: Andy Shevchenko @ 2019-04-05 15:44 UTC (permalink / raw)
  To: Liming Sun
  Cc: David Woods, Andy Shevchenko, Darren Hart, Vadim Pasternak,
	Linux Kernel Mailing List, Platform Driver

On Thu, Apr 4, 2019 at 10:36 PM Liming Sun <lsun@mellanox.com> wrote:
> This commit adds the TmFifo platform driver for Mellanox BlueField
> Soc. TmFifo is a shared FIFO which enables external host machine
> to exchange data with the SoC via USB or PCIe. The driver is based
> on virtio framework and has console and network access enabled.

Thanks for an update. Almost good.
My comments below.

Meanwhile I pushed this to my review and testing queue, thanks!

> +#include <linux/acpi.h>
> +#include <linux/bitfield.h>
> +#include <linux/circ_buf.h>
> +#include <linux/efi.h>
> +#include <linux/irq.h>
> +#include <linux/module.h>
> +#include <linux/mutex.h>
> +#include <linux/platform_device.h>
> +#include <linux/types.h>

Perhaps blank line here. Would be more clear that this is utilizing
virtio framework.

> +#include <linux/virtio_config.h>
> +#include <linux/virtio_console.h>
> +#include <linux/virtio_ids.h>
> +#include <linux/virtio_net.h>
> +#include <linux/virtio_ring.h>

> +/**
> + * mlxbf_tmfifo_msg_hdr - Structure of the TmFifo message header
> + * @type: message type
> + * @len: payload length
> + * @u: 64-bit union data
> + */
> +union mlxbf_tmfifo_msg_hdr {
> +       struct {
> +               u8 type;
> +               __be16 len;
> +               u8 unused[5];
> +       } __packed;
> +       u64 data;

I'm not sure I understand how you can distinguish which field of union to use?
Isn't here some type missed?

> +};

> +static u8 mlxbf_tmfifo_net_default_mac[ETH_ALEN] = {

> +       0x00, 0x1A, 0xCA, 0xFF, 0xFF, 0x01};

This should be two lines.

> +/* Supported virtio-net features. */
> +#define MLXBF_TMFIFO_NET_FEATURES      (BIT_ULL(VIRTIO_NET_F_MTU) | \
> +                                        BIT_ULL(VIRTIO_NET_F_STATUS) | \
> +                                        BIT_ULL(VIRTIO_NET_F_MAC))

Better to write as

#define FOO \
(BIT(x) | BIT(y) ...)

I think I told this earlier?

> +/* Allocate vrings for the fifo. */

fifo -> FIFO (and check all occurrences)

> +static int mlxbf_tmfifo_alloc_vrings(struct mlxbf_tmfifo *fifo,
> +                                    struct mlxbf_tmfifo_vdev *tm_vdev)
> +{
> +       struct mlxbf_tmfifo_vring *vring;
> +       struct device *dev;
> +       dma_addr_t dma;
> +       int i, size;
> +       void *va;
> +
> +       for (i = 0; i < ARRAY_SIZE(tm_vdev->vrings); i++) {
> +               vring = &tm_vdev->vrings[i];
> +               vring->fifo = fifo;
> +               vring->num = MLXBF_TMFIFO_VRING_SIZE;
> +               vring->align = SMP_CACHE_BYTES;
> +               vring->index = i;
> +               vring->vdev_id = tm_vdev->vdev.id.device;
> +               dev = &tm_vdev->vdev.dev;
> +
> +               size = vring_size(vring->num, vring->align);
> +               va = dma_alloc_coherent(dev->parent, size, &dma, GFP_KERNEL);
> +               if (!va) {

> +                       dev_err(dev->parent, "dma_alloc_coherent failed\n");

I don't see how this will free the allocated entries.
I think I told about this either.

> +                       return -ENOMEM;
> +               }
> +
> +               vring->va = va;
> +               vring->dma = dma;
> +       }
> +
> +       return 0;
> +}

> +/* House-keeping timer. */
> +static void mlxbf_tmfifo_timer(struct timer_list *arg)
> +{

> +       struct mlxbf_tmfifo *fifo = container_of(arg, struct mlxbf_tmfifo,
> +                                                timer);

One line would be still good enough.

> +       int more;
> +
> +       more = !test_and_set_bit(MLXBF_TM_RX_HWM_IRQ, &fifo->pend_events) ||
> +                   !test_and_set_bit(MLXBF_TM_TX_LWM_IRQ, &fifo->pend_events);
> +
> +       if (more)
> +               schedule_work(&fifo->work);
> +
> +       mod_timer(&fifo->timer, jiffies + MLXBF_TMFIFO_TIMER_INTERVAL);
> +}

> +       status = efi.get_variable(mlxbf_tmfifo_efi_name, &guid, NULL, &size,
> +                                 buf);
> +       if (status == EFI_SUCCESS && size == ETH_ALEN)
> +               ether_addr_copy(mac, buf);
> +       else

> +               memcpy(mac, mlxbf_tmfifo_net_default_mac, ETH_ALEN);

ether_addr_copy() as well.

> +}

> +       fifo->pdev = pdev;

Do you really need to keep pdev there? Isn't struct device pointer enough?


> +       /* Create the console vdev. */
> +       ret = mlxbf_tmfifo_create_vdev(&pdev->dev, fifo, VIRTIO_ID_CONSOLE, 0,
> +                                      NULL, 0);

If you define temporary variable
  struct device *dev = &pdev->dev;
these lines can be merged into one.

> +       if (ret)
> +               goto fail;

-- 
With Best Regards,
Andy Shevchenko

^ permalink raw reply	[flat|nested] 30+ messages in thread

* RE: [PATCH v13] platform/mellanox: Add TmFifo driver for Mellanox BlueField Soc
  2019-04-05 15:44   ` Andy Shevchenko
@ 2019-04-05 19:10     ` Liming Sun
  2019-04-07  2:05       ` Liming Sun
  0 siblings, 1 reply; 30+ messages in thread
From: Liming Sun @ 2019-04-05 19:10 UTC (permalink / raw)
  To: Andy Shevchenko
  Cc: David Woods, Andy Shevchenko, Darren Hart, Vadim Pasternak,
	Linux Kernel Mailing List, Platform Driver

Thanks Andy! I'll address the comments in v14.

Some question for the comment below:

> > +               size = vring_size(vring->num, vring->align);
> > +               va = dma_alloc_coherent(dev->parent, size, &dma, GFP_KERNEL);
> > +               if (!va) {
> 
> > +                       dev_err(dev->parent, "dma_alloc_coherent failed\n");
> > I don't see how this will free the allocated entries.
> I think I told about this either.

When an error is returned, all the allocated entries will be released by the
in the caller context by calling mlxbf_tmfifo_free_vrings(), like the logic below.
Or do you prefer releasing the entries in mlxbf_tmfifo_alloc_vrings() instead?

1073         if (mlxbf_tmfifo_alloc_vrings(fifo, tm_vdev)) {
1074                 dev_err(dev, "unable to allocate vring\n");
1075                 ret = -ENOMEM;
1076                 goto vdev_fail;
1077         }
...
1097 vdev_fail:
1098         mlxbf_tmfifo_free_vrings(fifo, tm_vdev);

Regards,
Liming

> -----Original Message-----
> From: Andy Shevchenko <andy.shevchenko@gmail.com>
> Sent: Friday, April 5, 2019 11:44 AM
> To: Liming Sun <lsun@mellanox.com>
> Cc: David Woods <dwoods@mellanox.com>; Andy Shevchenko <andy@infradead.org>; Darren Hart <dvhart@infradead.org>; Vadim
> Pasternak <vadimp@mellanox.com>; Linux Kernel Mailing List <linux-kernel@vger.kernel.org>; Platform Driver <platform-driver-
> x86@vger.kernel.org>
> Subject: Re: [PATCH v13] platform/mellanox: Add TmFifo driver for Mellanox BlueField Soc
> 
> On Thu, Apr 4, 2019 at 10:36 PM Liming Sun <lsun@mellanox.com> wrote:
> > This commit adds the TmFifo platform driver for Mellanox BlueField
> > Soc. TmFifo is a shared FIFO which enables external host machine
> > to exchange data with the SoC via USB or PCIe. The driver is based
> > on virtio framework and has console and network access enabled.
> 
> Thanks for an update. Almost good.
> My comments below.
> 
> Meanwhile I pushed this to my review and testing queue, thanks!
> 
> > +#include <linux/acpi.h>
> > +#include <linux/bitfield.h>
> > +#include <linux/circ_buf.h>
> > +#include <linux/efi.h>
> > +#include <linux/irq.h>
> > +#include <linux/module.h>
> > +#include <linux/mutex.h>
> > +#include <linux/platform_device.h>
> > +#include <linux/types.h>
> 
> Perhaps blank line here. Would be more clear that this is utilizing
> virtio framework.
> 
> > +#include <linux/virtio_config.h>
> > +#include <linux/virtio_console.h>
> > +#include <linux/virtio_ids.h>
> > +#include <linux/virtio_net.h>
> > +#include <linux/virtio_ring.h>
> 
> > +/**
> > + * mlxbf_tmfifo_msg_hdr - Structure of the TmFifo message header
> > + * @type: message type
> > + * @len: payload length
> > + * @u: 64-bit union data
> > + */
> > +union mlxbf_tmfifo_msg_hdr {
> > +       struct {
> > +               u8 type;
> > +               __be16 len;
> > +               u8 unused[5];
> > +       } __packed;
> > +       u64 data;
> 
> I'm not sure I understand how you can distinguish which field of union to use?
> Isn't here some type missed?
> 
> > +};
> 
> > +static u8 mlxbf_tmfifo_net_default_mac[ETH_ALEN] = {
> 
> > +       0x00, 0x1A, 0xCA, 0xFF, 0xFF, 0x01};
> 
> This should be two lines.
> 
> > +/* Supported virtio-net features. */
> > +#define MLXBF_TMFIFO_NET_FEATURES      (BIT_ULL(VIRTIO_NET_F_MTU) | \
> > +                                        BIT_ULL(VIRTIO_NET_F_STATUS) | \
> > +                                        BIT_ULL(VIRTIO_NET_F_MAC))
> 
> Better to write as
> 
> #define FOO \
> (BIT(x) | BIT(y) ...)
> 
> I think I told this earlier?
> 
> > +/* Allocate vrings for the fifo. */
> 
> fifo -> FIFO (and check all occurrences)
> 
> > +static int mlxbf_tmfifo_alloc_vrings(struct mlxbf_tmfifo *fifo,
> > +                                    struct mlxbf_tmfifo_vdev *tm_vdev)
> > +{
> > +       struct mlxbf_tmfifo_vring *vring;
> > +       struct device *dev;
> > +       dma_addr_t dma;
> > +       int i, size;
> > +       void *va;
> > +
> > +       for (i = 0; i < ARRAY_SIZE(tm_vdev->vrings); i++) {
> > +               vring = &tm_vdev->vrings[i];
> > +               vring->fifo = fifo;
> > +               vring->num = MLXBF_TMFIFO_VRING_SIZE;
> > +               vring->align = SMP_CACHE_BYTES;
> > +               vring->index = i;
> > +               vring->vdev_id = tm_vdev->vdev.id.device;
> > +               dev = &tm_vdev->vdev.dev;
> > +
> > +               size = vring_size(vring->num, vring->align);
> > +               va = dma_alloc_coherent(dev->parent, size, &dma, GFP_KERNEL);
> > +               if (!va) {
> 
> > +                       dev_err(dev->parent, "dma_alloc_coherent failed\n");
> 
> I don't see how this will free the allocated entries.
> I think I told about this either.
> 
> > +                       return -ENOMEM;
> > +               }
> > +
> > +               vring->va = va;
> > +               vring->dma = dma;
> > +       }
> > +
> > +       return 0;
> > +}
> 
> > +/* House-keeping timer. */
> > +static void mlxbf_tmfifo_timer(struct timer_list *arg)
> > +{
> 
> > +       struct mlxbf_tmfifo *fifo = container_of(arg, struct mlxbf_tmfifo,
> > +                                                timer);
> 
> One line would be still good enough.
> 
> > +       int more;
> > +
> > +       more = !test_and_set_bit(MLXBF_TM_RX_HWM_IRQ, &fifo->pend_events) ||
> > +                   !test_and_set_bit(MLXBF_TM_TX_LWM_IRQ, &fifo->pend_events);
> > +
> > +       if (more)
> > +               schedule_work(&fifo->work);
> > +
> > +       mod_timer(&fifo->timer, jiffies + MLXBF_TMFIFO_TIMER_INTERVAL);
> > +}
> 
> > +       status = efi.get_variable(mlxbf_tmfifo_efi_name, &guid, NULL, &size,
> > +                                 buf);
> > +       if (status == EFI_SUCCESS && size == ETH_ALEN)
> > +               ether_addr_copy(mac, buf);
> > +       else
> 
> > +               memcpy(mac, mlxbf_tmfifo_net_default_mac, ETH_ALEN);
> 
> ether_addr_copy() as well.
> 
> > +}
> 
> > +       fifo->pdev = pdev;
> 
> Do you really need to keep pdev there? Isn't struct device pointer enough?
> 
> 
> > +       /* Create the console vdev. */
> > +       ret = mlxbf_tmfifo_create_vdev(&pdev->dev, fifo, VIRTIO_ID_CONSOLE, 0,
> > +                                      NULL, 0);
> 
> If you define temporary variable
>   struct device *dev = &pdev->dev;
> these lines can be merged into one.
> 
> > +       if (ret)
> > +               goto fail;
> 
> --
> With Best Regards,
> Andy Shevchenko

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH v14] platform/mellanox: Add TmFifo driver for Mellanox BlueField Soc
       [not found] <b143b40446c1870fb8d422b364ead95d54552be9.1527264077.git.lsun@mellanox.com>
                   ` (8 preceding siblings ...)
  2019-04-04 19:36 ` [PATCH v13] " Liming Sun
@ 2019-04-07  2:03 ` Liming Sun
  2019-04-11 14:09   ` Andy Shevchenko
  2019-04-12 17:30 ` [PATCH v15] " Liming Sun
  2019-05-03 13:49 ` [PATCH v16] " Liming Sun
  11 siblings, 1 reply; 30+ messages in thread
From: Liming Sun @ 2019-04-07  2:03 UTC (permalink / raw)
  To: David Woods, Andy Shevchenko, Darren Hart, Vadim Pasternak
  Cc: Liming Sun, linux-kernel, platform-driver-x86

This commit adds the TmFifo platform driver for Mellanox BlueField
Soc. TmFifo is a shared FIFO which enables external host machine
to exchange data with the SoC via USB or PCIe. The driver is based
on virtio framework and has console and network access enabled.

Reviewed-by: Vadim Pasternak <vadimp@mellanox.com>
Signed-off-by: Liming Sun <lsun@mellanox.com>
---
v13->v14:
    Fixes for comments from Andy:
    - Add a blank line to separate the virtio header files;
    - Update the comment for 'union mlxbf_tmfifo_msg_hdr' to be
      more clear how this union is used;
    - Update the 'mlxbf_tmfifo_net_default_mac[ETH_ALEN]' definition
      to be two lines;
    - Reformat macro MLXBF_TMFIFO_NET_FEATURES to put the definition
      in a seperate line;
    - Update all 'fifo' to 'FIFO' in the comments;
    - Update mlxbf_tmfifo_alloc_vrings() to specifically release the
      allocated entries in case of failures, so the logic looks more
      clear. In the caller function the mlxbf_tmfifo_free_vrings()
      might be called again in case of other failures, which is ok
      since the 'va' pointer will be set to NULL once released;
    - Update mlxbf_tmfifo_timer() to change the first statement to
      one line;
    - Update one memcpy() to ether_addr_copy() in
      mlxbf_tmfifo_get_cfg_mac();
    - Remove 'fifo->pdev' since it is really not needed;
    - Define temporary variable to update the mlxbf_tmfifo_create_vdev()
      statement into single line.
    New changes by Liming:
    - Reorder the logic a little bit in mlxbf_tmfifo_timer(). Previously
      it has logic like "!a || !b" while the '!b' will not be evaluated
      if '!a' is true. It was changed to this way during review, but is
      actually not the desired behavior since both bits need to be
      tested/set in fifo->pend_events. This issue was found during
      verification which caused extra delays for Tx packets.
v12->v13:
    Rebase and resubmit (no new changes).
v11->v12:
    Fixed the two unsolved comments from v11.
    - "Change macro mlxbf_vdev_to_tmfifo() to one line"
      Done. Seems not hard.
    - "Is it appropriate use of devm_* for 'tm_vdev = devm_kzalloc'"
      Yes, understand the comment now. The tmfifo is fixed, but the
      vdev is dynamic. Use kzalloc() instead, and free the device
      in the release callback which is the right place for it.
v10->v11:
    Fixes for comments from Andy:
    - Use GENMASK_ULL() instead of GENMASK() in mlxbf-tmfifo-regs.h
    - Removed the cpu_to_le64()/le64_to_cpu() conversion since
      readq()/writeq() already takes care of it.
    - Remove the "if (irq)" check in mlxbf_tmfifo_disable_irqs().
    - Add "u32 count" temp variable in mlxbf_tmfifo_get_tx_avail().
    - Clean up mlxbf_tmfifo_get_cfg_mac(), use ETH_ALEN instead of
      value 6.
    - Change the tx_buf to use Linux existing 'struct circ_buf'.
    Comment not applied:
    - "Change macro mlxbf_vdev_to_tmfifo() to one line"
      Couldn't fit in one line with 80 chracters
    - "Is it appropriate use of devm_* for 'tm_vdev = devm_kzalloc'"
      This is SoC, the device won't be closed or detached.
      The only case is when the driver is unloaded. So it appears
      ok to use devm_kzalloc() since it's allocated during probe()
      and released during module unload.
    Comments from Vadim: OK
v9->v10:
    Fixes for comments from Andy:
    - Use devm_ioremap_resource() instead of devm_ioremap().
    - Use kernel-doc comments.
    - Keep Makefile contents sorted.
    - Use same fixed format for offsets.
    - Use SZ_1K/SZ_32K instead of 1024/23*1024.
    - Remove unnecessary comments.
    - Use one style for max numbers.
    - More comments for mlxbf_tmfifo_vdev and mlxbf_tmfifo_data_64bit.
    - Use globally defined MTU instead of new definition.
    - Remove forward declaration of mlxbf_tmfifo_remove().
    - Remove PAGE_ALIGN() for dma_alloc_coherent)().
    - Remove the cast of "struct vring *".
    - Check return result of test_and_set_bit().
    - Add a macro mlxbt_vdev_to_tmfifo().
    - Several other minor coding style comments.
    Comment not applied:
    - "Shouldn't be rather helper in EFI lib in kernel"
      Looks like efi.get_variable() is the way I found in the kernel
      tree.
    - "this one is not protected anyhow? Potential race condition"
      In mlxbf_tmfifo_console_tx(), the spin-lock is used to protect the
      'tx_buf' only, not the FIFO writes. So there is no race condition.
    - "Is __packed needed in mlxbf_tmfifo_msg_hdr".
      Yes, it is needed to make sure the structure is 8 bytes.
    Fixes for comments from Vadim:
    - Use tab in mlxbf-tmfifo-regs.h
    - Use kernel-doc comments for struct mlxbf_tmfifo_msg_hdr and
      mlxbf_tmfifo_irq_info as well.
    - Use _MAX instead of _CNT in the macro definition to be consistent.
    - Fix the MODULE_LICENSE.
    - Use BIT_ULL() instead of BIT().
    - Remove argument of 'avail' for mlxbf_tmfifo_rxtx_header() and
      mlxbf_tmfifo_rxtx_word()
    - Revise logic in mlxbf_tmfifo_rxtx_one_desc() to remove the
      WARN_ON().
    - Change "union mlxbf_tmfifo_u64 u" to "union mlxbf_tmfifo_u64 buf"
      in mlxbf_tmfifo_rxtx_word().
    - Change date type of vring_change from 'int' to 'bool'.
    - Remove the blank lines after Signed-off.
    - Don’t use declaration in the middle.
    - Make the network header initialization in some more elegant way.
    - Change label done to mlxbf_tmfifo_desc_done.
    - Remove some unnecessary comments, and several other misc coding
      style comments.
    - Simplify code logic in mlxbf_tmfifo_virtio_notify()
    New changes by Liming:
    - Simplify the Rx/Tx function arguments to make it more readable.
v8->v9:
    Fixes for comments from Andy:
    - Use modern devm_xxx() API instead.
    Fixes for comments from Vadim:
    - Split the Rx/Tx function into smaller funcitons.
    - File name, copyright information.
    - Function and variable name conversion.
    - Local variable and indent coding styles.
    - Remove unnecessary 'inline' declarations.
    - Use devm_xxx() APIs.
    - Move the efi_char16_t MAC address definition to global.
    - Fix warnings reported by 'checkpatch --strict'.
    - Fix warnings reported by 'make CF="-D__CHECK_ENDIAN__"'.
    - Change select VIRTIO_xxx to depends on  VIRTIO_ in Kconfig.
    - Merge mlxbf_tmfifo_vdev_tx_buf_push() and
      mlxbf_tmfifo_vdev_tx_buf_pop().
    - Add union to avoid casting between __le64 and u64.
    - Several other misc coding style comments.
    New changes by Liming:
    - Removed the DT binding documentation since only ACPI is
      supported for now by UEFI on the SoC.
v8: Re-submit under drivers/platform/mellanox for the target-side
    platform driver only.
v7: Added host side drivers into the same patch set.
v5~v6: Coding style fix.
v1~v4: Initial version for directory drivers/soc/mellanox.
---
 drivers/platform/mellanox/Kconfig             |   12 +-
 drivers/platform/mellanox/Makefile            |    1 +
 drivers/platform/mellanox/mlxbf-tmfifo-regs.h |   63 ++
 drivers/platform/mellanox/mlxbf-tmfifo.c      | 1294 +++++++++++++++++++++++++
 4 files changed, 1369 insertions(+), 1 deletion(-)
 create mode 100644 drivers/platform/mellanox/mlxbf-tmfifo-regs.h
 create mode 100644 drivers/platform/mellanox/mlxbf-tmfifo.c

diff --git a/drivers/platform/mellanox/Kconfig b/drivers/platform/mellanox/Kconfig
index cd8a908..530fe7e 100644
--- a/drivers/platform/mellanox/Kconfig
+++ b/drivers/platform/mellanox/Kconfig
@@ -5,7 +5,7 @@
 
 menuconfig MELLANOX_PLATFORM
 	bool "Platform support for Mellanox hardware"
-	depends on X86 || ARM || COMPILE_TEST
+	depends on X86 || ARM || ARM64 || COMPILE_TEST
 	---help---
 	  Say Y here to get to see options for platform support for
 	  Mellanox systems. This option alone does not add any kernel code.
@@ -34,4 +34,14 @@ config MLXREG_IO
 	  to system resets operation, system reset causes monitoring and some
 	  kinds of mux selection.
 
+config MLXBF_TMFIFO
+	tristate "Mellanox BlueField SoC TmFifo platform driver"
+	depends on ARM64
+	depends on ACPI
+	depends on VIRTIO_CONSOLE && VIRTIO_NET
+	help
+	  Say y here to enable TmFifo support. The TmFifo driver provides
+          platform driver support for the TmFifo which supports console
+          and networking based on the virtio framework.
+
 endif # MELLANOX_PLATFORM
diff --git a/drivers/platform/mellanox/Makefile b/drivers/platform/mellanox/Makefile
index 57074d9c..a229bda1 100644
--- a/drivers/platform/mellanox/Makefile
+++ b/drivers/platform/mellanox/Makefile
@@ -3,5 +3,6 @@
 # Makefile for linux/drivers/platform/mellanox
 # Mellanox Platform-Specific Drivers
 #
+obj-$(CONFIG_MLXBF_TMFIFO)	+= mlxbf-tmfifo.o
 obj-$(CONFIG_MLXREG_HOTPLUG)	+= mlxreg-hotplug.o
 obj-$(CONFIG_MLXREG_IO) += mlxreg-io.o
diff --git a/drivers/platform/mellanox/mlxbf-tmfifo-regs.h b/drivers/platform/mellanox/mlxbf-tmfifo-regs.h
new file mode 100644
index 0000000..e4f0d2e
--- /dev/null
+++ b/drivers/platform/mellanox/mlxbf-tmfifo-regs.h
@@ -0,0 +1,63 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Copyright (c) 2019, Mellanox Technologies. All rights reserved.
+ */
+
+#ifndef __MLXBF_TMFIFO_REGS_H__
+#define __MLXBF_TMFIFO_REGS_H__
+
+#include <linux/types.h>
+#include <linux/bits.h>
+
+#define MLXBF_TMFIFO_TX_DATA				0x00
+#define MLXBF_TMFIFO_TX_STS				0x08
+#define MLXBF_TMFIFO_TX_STS__LENGTH			0x0001
+#define MLXBF_TMFIFO_TX_STS__COUNT_SHIFT		0
+#define MLXBF_TMFIFO_TX_STS__COUNT_WIDTH		9
+#define MLXBF_TMFIFO_TX_STS__COUNT_RESET_VAL		0
+#define MLXBF_TMFIFO_TX_STS__COUNT_RMASK		GENMASK_ULL(8, 0)
+#define MLXBF_TMFIFO_TX_STS__COUNT_MASK			GENMASK_ULL(8, 0)
+#define MLXBF_TMFIFO_TX_CTL				0x10
+#define MLXBF_TMFIFO_TX_CTL__LENGTH			0x0001
+#define MLXBF_TMFIFO_TX_CTL__LWM_SHIFT			0
+#define MLXBF_TMFIFO_TX_CTL__LWM_WIDTH			8
+#define MLXBF_TMFIFO_TX_CTL__LWM_RESET_VAL		128
+#define MLXBF_TMFIFO_TX_CTL__LWM_RMASK			GENMASK_ULL(7, 0)
+#define MLXBF_TMFIFO_TX_CTL__LWM_MASK			GENMASK_ULL(7, 0)
+#define MLXBF_TMFIFO_TX_CTL__HWM_SHIFT			8
+#define MLXBF_TMFIFO_TX_CTL__HWM_WIDTH			8
+#define MLXBF_TMFIFO_TX_CTL__HWM_RESET_VAL		128
+#define MLXBF_TMFIFO_TX_CTL__HWM_RMASK			GENMASK_ULL(7, 0)
+#define MLXBF_TMFIFO_TX_CTL__HWM_MASK			GENMASK_ULL(15, 8)
+#define MLXBF_TMFIFO_TX_CTL__MAX_ENTRIES_SHIFT		32
+#define MLXBF_TMFIFO_TX_CTL__MAX_ENTRIES_WIDTH		9
+#define MLXBF_TMFIFO_TX_CTL__MAX_ENTRIES_RESET_VAL	256
+#define MLXBF_TMFIFO_TX_CTL__MAX_ENTRIES_RMASK		GENMASK_ULL(8, 0)
+#define MLXBF_TMFIFO_TX_CTL__MAX_ENTRIES_MASK		GENMASK_ULL(40, 32)
+#define MLXBF_TMFIFO_RX_DATA				0x00
+#define MLXBF_TMFIFO_RX_STS				0x08
+#define MLXBF_TMFIFO_RX_STS__LENGTH			0x0001
+#define MLXBF_TMFIFO_RX_STS__COUNT_SHIFT		0
+#define MLXBF_TMFIFO_RX_STS__COUNT_WIDTH		9
+#define MLXBF_TMFIFO_RX_STS__COUNT_RESET_VAL		0
+#define MLXBF_TMFIFO_RX_STS__COUNT_RMASK		GENMASK_ULL(8, 0)
+#define MLXBF_TMFIFO_RX_STS__COUNT_MASK			GENMASK_ULL(8, 0)
+#define MLXBF_TMFIFO_RX_CTL				0x10
+#define MLXBF_TMFIFO_RX_CTL__LENGTH			0x0001
+#define MLXBF_TMFIFO_RX_CTL__LWM_SHIFT			0
+#define MLXBF_TMFIFO_RX_CTL__LWM_WIDTH			8
+#define MLXBF_TMFIFO_RX_CTL__LWM_RESET_VAL		128
+#define MLXBF_TMFIFO_RX_CTL__LWM_RMASK			GENMASK_ULL(7, 0)
+#define MLXBF_TMFIFO_RX_CTL__LWM_MASK			GENMASK_ULL(7, 0)
+#define MLXBF_TMFIFO_RX_CTL__HWM_SHIFT			8
+#define MLXBF_TMFIFO_RX_CTL__HWM_WIDTH			8
+#define MLXBF_TMFIFO_RX_CTL__HWM_RESET_VAL		128
+#define MLXBF_TMFIFO_RX_CTL__HWM_RMASK			GENMASK_ULL(7, 0)
+#define MLXBF_TMFIFO_RX_CTL__HWM_MASK			GENMASK_ULL(15, 8)
+#define MLXBF_TMFIFO_RX_CTL__MAX_ENTRIES_SHIFT		32
+#define MLXBF_TMFIFO_RX_CTL__MAX_ENTRIES_WIDTH		9
+#define MLXBF_TMFIFO_RX_CTL__MAX_ENTRIES_RESET_VAL	256
+#define MLXBF_TMFIFO_RX_CTL__MAX_ENTRIES_RMASK		GENMASK_ULL(8, 0)
+#define MLXBF_TMFIFO_RX_CTL__MAX_ENTRIES_MASK		GENMASK_ULL(40, 32)
+
+#endif /* !defined(__MLXBF_TMFIFO_REGS_H__) */
diff --git a/drivers/platform/mellanox/mlxbf-tmfifo.c b/drivers/platform/mellanox/mlxbf-tmfifo.c
new file mode 100644
index 0000000..d9b7008
--- /dev/null
+++ b/drivers/platform/mellanox/mlxbf-tmfifo.c
@@ -0,0 +1,1294 @@
+// SPDX-License-Identifier: GPL-2.0+
+/*
+ * Mellanox BlueField SoC TmFifo driver
+ *
+ * Copyright (C) 2019 Mellanox Technologies
+ */
+
+#include <linux/acpi.h>
+#include <linux/bitfield.h>
+#include <linux/circ_buf.h>
+#include <linux/efi.h>
+#include <linux/irq.h>
+#include <linux/module.h>
+#include <linux/mutex.h>
+#include <linux/platform_device.h>
+#include <linux/types.h>
+
+#include <linux/virtio_config.h>
+#include <linux/virtio_console.h>
+#include <linux/virtio_ids.h>
+#include <linux/virtio_net.h>
+#include <linux/virtio_ring.h>
+
+#include "mlxbf-tmfifo-regs.h"
+
+/* Vring size. */
+#define MLXBF_TMFIFO_VRING_SIZE			SZ_1K
+
+/* Console Tx buffer size. */
+#define MLXBF_TMFIFO_CON_TX_BUF_SIZE		SZ_32K
+
+/* Console Tx buffer reserved space. */
+#define MLXBF_TMFIFO_CON_TX_BUF_RSV_SIZE	8
+
+/* House-keeping timer interval. */
+#define MLXBF_TMFIFO_TIMER_INTERVAL		(HZ / 10)
+
+/* Virtual devices sharing the TM FIFO. */
+#define MLXBF_TMFIFO_VDEV_MAX		(VIRTIO_ID_CONSOLE + 1)
+
+/*
+ * Reserve 1/16 of TmFifo space, so console messages are not starved by
+ * the networking traffic.
+ */
+#define MLXBF_TMFIFO_RESERVE_RATIO		16
+
+/* Message with data needs at least two words (for header & data). */
+#define MLXBF_TMFIFO_DATA_MIN_WORDS		2
+
+struct mlxbf_tmfifo;
+
+/**
+ * mlxbf_tmfifo_vring - Structure of the TmFifo virtual ring
+ * @va: virtual address of the ring
+ * @dma: dma address of the ring
+ * @vq: pointer to the virtio virtqueue
+ * @desc: current descriptor of the pending packet
+ * @desc_head: head descriptor of the pending packet
+ * @cur_len: processed length of the current descriptor
+ * @rem_len: remaining length of the pending packet
+ * @pkt_len: total length of the pending packet
+ * @next_avail: next avail descriptor id
+ * @num: vring size (number of descriptors)
+ * @align: vring alignment size
+ * @index: vring index
+ * @vdev_id: vring virtio id (VIRTIO_ID_xxx)
+ * @fifo: pointer to the tmfifo structure
+ */
+struct mlxbf_tmfifo_vring {
+	void *va;
+	dma_addr_t dma;
+	struct virtqueue *vq;
+	struct vring_desc *desc;
+	struct vring_desc *desc_head;
+	int cur_len;
+	int rem_len;
+	u32 pkt_len;
+	u16 next_avail;
+	int num;
+	int align;
+	int index;
+	int vdev_id;
+	struct mlxbf_tmfifo *fifo;
+};
+
+/* Interrupt types. */
+enum {
+	MLXBF_TM_RX_LWM_IRQ,
+	MLXBF_TM_RX_HWM_IRQ,
+	MLXBF_TM_TX_LWM_IRQ,
+	MLXBF_TM_TX_HWM_IRQ,
+	MLXBF_TM_MAX_IRQ
+};
+
+/* Ring types (Rx & Tx). */
+enum {
+	MLXBF_TMFIFO_VRING_RX,
+	MLXBF_TMFIFO_VRING_TX,
+	MLXBF_TMFIFO_VRING_MAX
+};
+
+/**
+ * mlxbf_tmfifo_vdev - Structure of the TmFifo virtual device
+ * @vdev: virtio device, in which the vdev.id.device field has the
+ *        VIRTIO_ID_xxx id to distinguish the virtual device.
+ * @status: status of the device
+ * @features: supported features of the device
+ * @vrings: array of tmfifo vrings of this device
+ * @config.cons: virtual console config -
+ *               select if vdev.id.device is VIRTIO_ID_CONSOLE
+ * @config.net: virtual network config -
+ *              select if vdev.id.device is VIRTIO_ID_NET
+ * @tx_buf: tx buffer used to buffer data before writing into the FIFO
+ */
+struct mlxbf_tmfifo_vdev {
+	struct virtio_device vdev;
+	u8 status;
+	u64 features;
+	struct mlxbf_tmfifo_vring vrings[MLXBF_TMFIFO_VRING_MAX];
+	union {
+		struct virtio_console_config cons;
+		struct virtio_net_config net;
+	} config;
+	struct circ_buf tx_buf;
+};
+
+/**
+ * mlxbf_tmfifo_irq_info - Structure of the interrupt information
+ * @fifo: pointer to the tmfifo structure
+ * @irq: interrupt number
+ * @index: index into the interrupt array
+ */
+struct mlxbf_tmfifo_irq_info {
+	struct mlxbf_tmfifo *fifo;
+	int irq;
+	int index;
+};
+
+/**
+ * mlxbf_tmfifo - Structure of the TmFifo
+ * @vdev: array of the virtual devices running over the TmFifo
+ * @lock: lock to protect the TmFifo access
+ * @rx_base: mapped register base address for the Rx FIFO
+ * @tx_base: mapped register base address for the Tx FIFO
+ * @rx_fifo_size: number of entries of the Rx FIFO
+ * @tx_fifo_size: number of entries of the Tx FIFO
+ * @pend_events: pending bits for deferred events
+ * @irq_info: interrupt information
+ * @work: work struct for deferred process
+ * @timer: background timer
+ * @vring: Tx/Rx ring
+ * @spin_lock: spin lock
+ * @is_ready: ready flag
+ */
+struct mlxbf_tmfifo {
+	struct mlxbf_tmfifo_vdev *vdev[MLXBF_TMFIFO_VDEV_MAX];
+	struct mutex lock;		/* TmFifo lock */
+	void __iomem *rx_base;
+	void __iomem *tx_base;
+	int rx_fifo_size;
+	int tx_fifo_size;
+	unsigned long pend_events;
+	struct mlxbf_tmfifo_irq_info irq_info[MLXBF_TM_MAX_IRQ];
+	struct work_struct work;
+	struct timer_list timer;
+	struct mlxbf_tmfifo_vring *vring[2];
+	spinlock_t spin_lock;		/* spin lock */
+	bool is_ready;
+};
+
+/**
+ * mlxbf_tmfifo_msg_hdr - Structure of the TmFifo message header
+ * @type: message type
+ * @len: payload length
+ * @data: 64-bit data used to write the message header into the TmFifo register.
+ *
+ * This message header is a union of struct and u64 data. The 'struct' has
+ * type and length field which are used to encode & decode the message. The
+ * 'data' field is used to read/write the message header from/to the FIFO.
+ */
+union mlxbf_tmfifo_msg_hdr {
+	struct {
+		u8 type;
+		__be16 len;
+		u8 unused[5];
+	} __packed;
+	u64 data;
+};
+
+/*
+ * Default MAC.
+ * This MAC address will be read from EFI persistent variable if configured.
+ * It can also be reconfigured with standard Linux tools.
+ */
+static u8 mlxbf_tmfifo_net_default_mac[ETH_ALEN] = {
+	0x00, 0x1A, 0xCA, 0xFF, 0xFF, 0x01
+};
+
+/* EFI variable name of the MAC address. */
+static efi_char16_t mlxbf_tmfifo_efi_name[] = L"RshimMacAddr";
+
+/* Maximum L2 header length. */
+#define MLXBF_TMFIFO_NET_L2_OVERHEAD	36
+
+/* Supported virtio-net features. */
+#define MLXBF_TMFIFO_NET_FEATURES \
+	(BIT_ULL(VIRTIO_NET_F_MTU) | BIT_ULL(VIRTIO_NET_F_STATUS) | \
+	 BIT_ULL(VIRTIO_NET_F_MAC))
+
+#define mlxbf_vdev_to_tmfifo(d) container_of(d, struct mlxbf_tmfifo_vdev, vdev)
+
+/* Free vrings of the FIFO device. */
+static void mlxbf_tmfifo_free_vrings(struct mlxbf_tmfifo *fifo,
+				     struct mlxbf_tmfifo_vdev *tm_vdev)
+{
+	struct mlxbf_tmfifo_vring *vring;
+	int i, size;
+
+	for (i = 0; i < ARRAY_SIZE(tm_vdev->vrings); i++) {
+		vring = &tm_vdev->vrings[i];
+		if (vring->va) {
+			size = vring_size(vring->num, vring->align);
+			dma_free_coherent(tm_vdev->vdev.dev.parent, size,
+					  vring->va, vring->dma);
+			vring->va = NULL;
+			if (vring->vq) {
+				vring_del_virtqueue(vring->vq);
+				vring->vq = NULL;
+			}
+		}
+	}
+}
+
+/* Allocate vrings for the FIFO. */
+static int mlxbf_tmfifo_alloc_vrings(struct mlxbf_tmfifo *fifo,
+				     struct mlxbf_tmfifo_vdev *tm_vdev)
+{
+	struct mlxbf_tmfifo_vring *vring;
+	struct device *dev;
+	dma_addr_t dma;
+	int i, size;
+	void *va;
+
+	for (i = 0; i < ARRAY_SIZE(tm_vdev->vrings); i++) {
+		vring = &tm_vdev->vrings[i];
+		vring->fifo = fifo;
+		vring->num = MLXBF_TMFIFO_VRING_SIZE;
+		vring->align = SMP_CACHE_BYTES;
+		vring->index = i;
+		vring->vdev_id = tm_vdev->vdev.id.device;
+		dev = &tm_vdev->vdev.dev;
+
+		size = vring_size(vring->num, vring->align);
+		va = dma_alloc_coherent(dev->parent, size, &dma, GFP_KERNEL);
+		if (!va) {
+			dev_err(dev->parent, "dma_alloc_coherent failed\n");
+			mlxbf_tmfifo_free_vrings(fifo, tm_vdev);
+			return -ENOMEM;
+		}
+
+		vring->va = va;
+		vring->dma = dma;
+	}
+
+	return 0;
+}
+
+/* Disable interrupts of the FIFO device. */
+static void mlxbf_tmfifo_disable_irqs(struct mlxbf_tmfifo *fifo)
+{
+	int i, irq;
+
+	for (i = 0; i < MLXBF_TM_MAX_IRQ; i++) {
+		irq = fifo->irq_info[i].irq;
+		fifo->irq_info[i].irq = 0;
+		disable_irq(irq);
+	}
+}
+
+/* Interrupt handler. */
+static irqreturn_t mlxbf_tmfifo_irq_handler(int irq, void *arg)
+{
+	struct mlxbf_tmfifo_irq_info *irq_info = arg;
+
+	if (irq_info->index < MLXBF_TM_MAX_IRQ &&
+	    !test_and_set_bit(irq_info->index, &irq_info->fifo->pend_events))
+		schedule_work(&irq_info->fifo->work);
+
+	return IRQ_HANDLED;
+}
+
+/* Get the next packet descriptor from the vring. */
+static struct vring_desc *
+mlxbf_tmfifo_get_next_desc(struct mlxbf_tmfifo_vring *vring)
+{
+	const struct vring *vr = virtqueue_get_vring(vring->vq);
+	struct virtio_device *vdev = vring->vq->vdev;
+	unsigned int idx, head;
+
+	if (vring->next_avail == virtio16_to_cpu(vdev, vr->avail->idx))
+		return NULL;
+
+	idx = vring->next_avail % vr->num;
+	head = virtio16_to_cpu(vdev, vr->avail->ring[idx]);
+	if (WARN_ON(head >= vr->num))
+		return NULL;
+
+	vring->next_avail++;
+
+	return &vr->desc[head];
+}
+
+/* Release virtio descriptor. */
+static void mlxbf_tmfifo_release_desc(struct mlxbf_tmfifo_vring *vring,
+				      struct vring_desc *desc, u32 len)
+{
+	const struct vring *vr = virtqueue_get_vring(vring->vq);
+	struct virtio_device *vdev = vring->vq->vdev;
+	u16 idx, vr_idx;
+
+	vr_idx = virtio16_to_cpu(vdev, vr->used->idx);
+	idx = vr_idx % vr->num;
+	vr->used->ring[idx].id = cpu_to_virtio32(vdev, desc - vr->desc);
+	vr->used->ring[idx].len = cpu_to_virtio32(vdev, len);
+
+	/*
+	 * Virtio could poll and check the 'idx' to decide whether the desc is
+	 * done or not. Add a memory barrier here to make sure the update above
+	 * completes before updating the idx.
+	 */
+	mb();
+	vr->used->idx = cpu_to_virtio16(vdev, vr_idx + 1);
+}
+
+/* Get the total length of the descriptor chain. */
+static u32 mlxbf_tmfifo_get_pkt_len(struct mlxbf_tmfifo_vring *vring,
+				    struct vring_desc *desc)
+{
+	const struct vring *vr = virtqueue_get_vring(vring->vq);
+	struct virtio_device *vdev = vring->vq->vdev;
+	u32 len = 0, idx;
+
+	while (desc) {
+		len += virtio32_to_cpu(vdev, desc->len);
+		if (!(virtio16_to_cpu(vdev, desc->flags) & VRING_DESC_F_NEXT))
+			break;
+		idx = virtio16_to_cpu(vdev, desc->next);
+		desc = &vr->desc[idx];
+	}
+
+	return len;
+}
+
+static void mlxbf_tmfifo_release_pending_pkt(struct mlxbf_tmfifo_vring *vring)
+{
+	struct vring_desc *desc_head;
+	u32 len = 0;
+
+	if (vring->desc_head) {
+		desc_head = vring->desc_head;
+		len = vring->pkt_len;
+	} else {
+		desc_head = mlxbf_tmfifo_get_next_desc(vring);
+		if (desc_head)
+			len = mlxbf_tmfifo_get_pkt_len(vring, desc_head);
+	}
+
+	if (desc_head)
+		mlxbf_tmfifo_release_desc(vring, desc_head, len);
+
+	vring->pkt_len = 0;
+	vring->desc = NULL;
+	vring->desc_head = NULL;
+}
+
+static void mlxbf_tmfifo_init_net_desc(struct mlxbf_tmfifo_vring *vring,
+				       struct vring_desc *desc, bool is_rx)
+{
+	struct virtio_device *vdev = vring->vq->vdev;
+	struct virtio_net_hdr *net_hdr;
+
+	net_hdr = phys_to_virt(virtio64_to_cpu(vdev, desc->addr));
+	memset(net_hdr, 0, sizeof(*net_hdr));
+}
+
+/* Get and initialize the next packet. */
+static struct vring_desc *
+mlxbf_tmfifo_get_next_pkt(struct mlxbf_tmfifo_vring *vring, bool is_rx)
+{
+	struct vring_desc *desc;
+
+	desc = mlxbf_tmfifo_get_next_desc(vring);
+	if (desc && is_rx && vring->vdev_id == VIRTIO_ID_NET)
+		mlxbf_tmfifo_init_net_desc(vring, desc, is_rx);
+
+	vring->desc_head = desc;
+	vring->desc = desc;
+
+	return desc;
+}
+
+/* House-keeping timer. */
+static void mlxbf_tmfifo_timer(struct timer_list *t)
+{
+	struct mlxbf_tmfifo *fifo = container_of(t, struct mlxbf_tmfifo, timer);
+	int rx, tx;
+
+	rx = !test_and_set_bit(MLXBF_TM_RX_HWM_IRQ, &fifo->pend_events);
+	tx = !test_and_set_bit(MLXBF_TM_TX_LWM_IRQ, &fifo->pend_events);
+
+	if (rx || tx)
+		schedule_work(&fifo->work);
+
+	mod_timer(&fifo->timer, jiffies + MLXBF_TMFIFO_TIMER_INTERVAL);
+}
+
+/* Copy one console packet into the output buffer. */
+static void mlxbf_tmfifo_console_output_one(struct mlxbf_tmfifo_vdev *cons,
+					    struct mlxbf_tmfifo_vring *vring,
+					    struct vring_desc *desc)
+{
+	const struct vring *vr = virtqueue_get_vring(vring->vq);
+	struct virtio_device *vdev = &cons->vdev;
+	u32 len, idx, seg;
+	void *addr;
+
+	while (desc) {
+		addr = phys_to_virt(virtio64_to_cpu(vdev, desc->addr));
+		len = virtio32_to_cpu(vdev, desc->len);
+
+		seg = CIRC_SPACE_TO_END(cons->tx_buf.head, cons->tx_buf.tail,
+					MLXBF_TMFIFO_CON_TX_BUF_SIZE);
+		if (len <= seg) {
+			memcpy(cons->tx_buf.buf + cons->tx_buf.head, addr, len);
+		} else {
+			memcpy(cons->tx_buf.buf + cons->tx_buf.head, addr, seg);
+			addr += seg;
+			memcpy(cons->tx_buf.buf, addr, len - seg);
+		}
+		cons->tx_buf.head = (cons->tx_buf.head + len) %
+			MLXBF_TMFIFO_CON_TX_BUF_SIZE;
+
+		if (!(virtio16_to_cpu(vdev, desc->flags) & VRING_DESC_F_NEXT))
+			break;
+		idx = virtio16_to_cpu(vdev, desc->next);
+		desc = &vr->desc[idx];
+	}
+}
+
+/* Copy console data into the output buffer. */
+static void mlxbf_tmfifo_console_output(struct mlxbf_tmfifo_vdev *cons,
+					struct mlxbf_tmfifo_vring *vring)
+{
+	struct vring_desc *desc;
+	u32 len, avail;
+
+	desc = mlxbf_tmfifo_get_next_desc(vring);
+	while (desc) {
+		/* Release the packet if not enough space. */
+		len = mlxbf_tmfifo_get_pkt_len(vring, desc);
+		avail = CIRC_SPACE(cons->tx_buf.head, cons->tx_buf.tail,
+				   MLXBF_TMFIFO_CON_TX_BUF_SIZE);
+		if (len + MLXBF_TMFIFO_CON_TX_BUF_RSV_SIZE > avail) {
+			mlxbf_tmfifo_release_desc(vring, desc, len);
+			break;
+		}
+
+		mlxbf_tmfifo_console_output_one(cons, vring, desc);
+		mlxbf_tmfifo_release_desc(vring, desc, len);
+		desc = mlxbf_tmfifo_get_next_desc(vring);
+	}
+}
+
+/* Get the number of available words in Rx FIFO for receiving. */
+static int mlxbf_tmfifo_get_rx_avail(struct mlxbf_tmfifo *fifo)
+{
+	u64 sts;
+
+	sts = readq(fifo->rx_base + MLXBF_TMFIFO_RX_STS);
+	return FIELD_GET(MLXBF_TMFIFO_RX_STS__COUNT_MASK, sts);
+}
+
+/* Get the number of available words in the TmFifo for sending. */
+static int mlxbf_tmfifo_get_tx_avail(struct mlxbf_tmfifo *fifo, int vdev_id)
+{
+	int tx_reserve;
+	u32 count;
+	u64 sts;
+
+	/* Reserve some room in FIFO for console messages. */
+	if (vdev_id == VIRTIO_ID_NET)
+		tx_reserve = fifo->tx_fifo_size / MLXBF_TMFIFO_RESERVE_RATIO;
+	else
+		tx_reserve = 1;
+
+	sts = readq(fifo->tx_base + MLXBF_TMFIFO_TX_STS);
+	count = FIELD_GET(MLXBF_TMFIFO_TX_STS__COUNT_MASK, sts);
+	return fifo->tx_fifo_size - tx_reserve - count;
+}
+
+/* Console Tx (move data from the output buffer into the TmFifo). */
+static void mlxbf_tmfifo_console_tx(struct mlxbf_tmfifo *fifo, int avail)
+{
+	union mlxbf_tmfifo_msg_hdr hdr;
+	struct mlxbf_tmfifo_vdev *cons;
+	unsigned long flags;
+	int size, seg;
+	void *addr;
+	u64 data;
+
+	/* Return if not enough space available. */
+	if (avail < MLXBF_TMFIFO_DATA_MIN_WORDS)
+		return;
+
+	cons = fifo->vdev[VIRTIO_ID_CONSOLE];
+	if (!cons || !cons->tx_buf.buf)
+		return;
+
+	/* Return if no data to send. */
+	size = CIRC_CNT(cons->tx_buf.head, cons->tx_buf.tail,
+			MLXBF_TMFIFO_CON_TX_BUF_SIZE);
+	if (size == 0)
+		return;
+
+	/* Adjust the size to available space. */
+	if (size + sizeof(hdr) > avail * sizeof(u64))
+		size = avail * sizeof(u64) - sizeof(hdr);
+
+	/* Write header. */
+	hdr.data = 0;
+	hdr.type = VIRTIO_ID_CONSOLE;
+	hdr.len = htons(size);
+	writeq(hdr.data, fifo->tx_base + MLXBF_TMFIFO_TX_DATA);
+
+	/* Use spin-lock to protect the 'cons->tx_buf'. */
+	spin_lock_irqsave(&fifo->spin_lock, flags);
+
+	while (size > 0) {
+		addr = cons->tx_buf.buf + cons->tx_buf.tail;
+
+		seg = CIRC_CNT_TO_END(cons->tx_buf.head, cons->tx_buf.tail,
+				      MLXBF_TMFIFO_CON_TX_BUF_SIZE);
+		if (seg >= sizeof(u64)) {
+			memcpy(&data, addr, sizeof(u64));
+		} else {
+			memcpy(&data, addr, seg);
+			memcpy((u8 *)&data + seg, cons->tx_buf.buf,
+			       sizeof(u64) - seg);
+		}
+		writeq(data, fifo->tx_base + MLXBF_TMFIFO_TX_DATA);
+
+		if (size >= sizeof(u64)) {
+			cons->tx_buf.tail = (cons->tx_buf.tail + sizeof(u64)) %
+				MLXBF_TMFIFO_CON_TX_BUF_SIZE;
+			size -= sizeof(u64);
+		} else {
+			cons->tx_buf.tail = (cons->tx_buf.tail + size) %
+				MLXBF_TMFIFO_CON_TX_BUF_SIZE;
+			size = 0;
+		}
+	}
+
+	spin_unlock_irqrestore(&fifo->spin_lock, flags);
+}
+
+/* Rx/Tx one word in the descriptor buffer. */
+static void mlxbf_tmfifo_rxtx_word(struct mlxbf_tmfifo_vring *vring,
+				   struct vring_desc *desc,
+				   bool is_rx, int len)
+{
+	struct virtio_device *vdev = vring->vq->vdev;
+	struct mlxbf_tmfifo *fifo = vring->fifo;
+	void *addr;
+	u64 data;
+
+	/* Get the buffer address of this desc. */
+	addr = phys_to_virt(virtio64_to_cpu(vdev, desc->addr));
+
+	/* Read a word from FIFO for Rx. */
+	if (is_rx)
+		data = readq(fifo->rx_base + MLXBF_TMFIFO_RX_DATA);
+
+	if (vring->cur_len + sizeof(u64) <= len) {
+		/* The whole word. */
+		if (is_rx)
+			memcpy(addr + vring->cur_len, &data, sizeof(u64));
+		else
+			memcpy(&data, addr + vring->cur_len, sizeof(u64));
+		vring->cur_len += sizeof(u64);
+	} else {
+		/* Leftover bytes. */
+		if (is_rx)
+			memcpy(addr + vring->cur_len, &data,
+			       len - vring->cur_len);
+		else
+			memcpy(&data, addr + vring->cur_len,
+			       len - vring->cur_len);
+		vring->cur_len = len;
+	}
+
+	/* Write the word into FIFO for Tx. */
+	if (!is_rx)
+		writeq(data, fifo->tx_base + MLXBF_TMFIFO_TX_DATA);
+}
+
+/*
+ * Rx/Tx packet header.
+ *
+ * In Rx case, the packet might be found to belong to a different vring since
+ * the TmFifo is shared by different services. In such case, the 'vring_change'
+ * flag is set.
+ */
+static void mlxbf_tmfifo_rxtx_header(struct mlxbf_tmfifo_vring *vring,
+				     struct vring_desc *desc,
+				     bool is_rx, bool *vring_change)
+{
+	struct mlxbf_tmfifo *fifo = vring->fifo;
+	struct virtio_net_config *config;
+	union mlxbf_tmfifo_msg_hdr hdr;
+	int vdev_id, hdr_len;
+
+	/* Read/Write packet header. */
+	if (is_rx) {
+		/* Drain one word from the FIFO. */
+		hdr.data = readq(fifo->rx_base + MLXBF_TMFIFO_RX_DATA);
+
+		/* Skip the length 0 packets (keepalive). */
+		if (hdr.len == 0)
+			return;
+
+		/* Check packet type. */
+		if (hdr.type == VIRTIO_ID_NET) {
+			vdev_id = VIRTIO_ID_NET;
+			hdr_len = sizeof(struct virtio_net_hdr);
+			config = &fifo->vdev[vdev_id]->config.net;
+			if (ntohs(hdr.len) > config->mtu +
+			    MLXBF_TMFIFO_NET_L2_OVERHEAD)
+				return;
+		} else {
+			vdev_id = VIRTIO_ID_CONSOLE;
+			hdr_len = 0;
+		}
+
+		/*
+		 * Check whether the new packet still belongs to this vring.
+		 * If not, update the pkt_len of the new vring.
+		 */
+		if (vdev_id != vring->vdev_id) {
+			struct mlxbf_tmfifo_vdev *tm_dev2 = fifo->vdev[vdev_id];
+
+			if (!tm_dev2)
+				return;
+			vring->desc = desc;
+			vring = &tm_dev2->vrings[MLXBF_TMFIFO_VRING_RX];
+			*vring_change = true;
+		}
+		vring->pkt_len = ntohs(hdr.len) + hdr_len;
+	} else {
+		/* Network virtio has an extra header. */
+		hdr_len = (vring->vdev_id == VIRTIO_ID_NET) ?
+			   sizeof(struct virtio_net_hdr) : 0;
+		vring->pkt_len = mlxbf_tmfifo_get_pkt_len(vring, desc);
+		hdr.data = 0;
+		hdr.type = (vring->vdev_id == VIRTIO_ID_NET) ?
+			    VIRTIO_ID_NET : VIRTIO_ID_CONSOLE;
+		hdr.len = htons(vring->pkt_len - hdr_len);
+		writeq(hdr.data, fifo->tx_base + MLXBF_TMFIFO_TX_DATA);
+	}
+
+	vring->cur_len = hdr_len;
+	vring->rem_len = vring->pkt_len;
+	fifo->vring[is_rx] = vring;
+}
+
+/*
+ * Rx/Tx one descriptor.
+ *
+ * Return true to indicate more data available.
+ */
+static bool mlxbf_tmfifo_rxtx_one_desc(struct mlxbf_tmfifo_vring *vring,
+				       bool is_rx, int *avail)
+{
+	const struct vring *vr = virtqueue_get_vring(vring->vq);
+	struct mlxbf_tmfifo *fifo = vring->fifo;
+	struct virtio_device *vdev;
+	bool vring_change = false;
+	struct vring_desc *desc;
+	unsigned long flags;
+	u32 len, idx;
+
+	vdev = &fifo->vdev[vring->vdev_id]->vdev;
+
+	/* Get the descriptor of the next packet. */
+	if (!vring->desc) {
+		desc = mlxbf_tmfifo_get_next_pkt(vring, is_rx);
+		if (!desc)
+			return false;
+	} else {
+		desc = vring->desc;
+	}
+
+	/* Beginning of a packet. Start to Rx/Tx packet header. */
+	if (vring->pkt_len == 0) {
+		mlxbf_tmfifo_rxtx_header(vring, desc, is_rx, &vring_change);
+		(*avail)--;
+
+		/* Return if new packet is for another ring. */
+		if (vring_change)
+			return false;
+		goto mlxbf_tmfifo_desc_done;
+	}
+
+	/* Get the length of this desc. */
+	len = virtio32_to_cpu(vdev, desc->len);
+	if (len > vring->rem_len)
+		len = vring->rem_len;
+
+	/* Rx/Tx one word (8 bytes) if not done. */
+	if (vring->cur_len < len) {
+		mlxbf_tmfifo_rxtx_word(vring, desc, is_rx, len);
+		(*avail)--;
+	}
+
+	/* Check again whether it's done. */
+	if (vring->cur_len == len) {
+		vring->cur_len = 0;
+		vring->rem_len -= len;
+
+		/* Get the next desc on the chain. */
+		if (vring->rem_len > 0 &&
+		    (virtio16_to_cpu(vdev, desc->flags) & VRING_DESC_F_NEXT)) {
+			idx = virtio16_to_cpu(vdev, desc->next);
+			desc = &vr->desc[idx];
+			goto mlxbf_tmfifo_desc_done;
+		}
+
+		/* Done and release the pending packet. */
+		mlxbf_tmfifo_release_pending_pkt(vring);
+		desc = NULL;
+		fifo->vring[is_rx] = NULL;
+
+		/* Notify upper layer that packet is done. */
+		spin_lock_irqsave(&fifo->spin_lock, flags);
+		vring_interrupt(0, vring->vq);
+		spin_unlock_irqrestore(&fifo->spin_lock, flags);
+	}
+
+mlxbf_tmfifo_desc_done:
+	/* Save the current desc. */
+	vring->desc = desc;
+
+	return true;
+}
+
+/* Rx & Tx processing of a queue. */
+static void mlxbf_tmfifo_rxtx(struct mlxbf_tmfifo_vring *vring, bool is_rx)
+{
+	int avail = 0, devid = vring->vdev_id;
+	struct mlxbf_tmfifo *fifo;
+	bool more;
+
+	fifo = vring->fifo;
+
+	/* Return if vdev is not ready. */
+	if (!fifo->vdev[devid])
+		return;
+
+	/* Return if another vring is running. */
+	if (fifo->vring[is_rx] && fifo->vring[is_rx] != vring)
+		return;
+
+	/* Only handle console and network for now. */
+	if (WARN_ON(devid != VIRTIO_ID_NET && devid != VIRTIO_ID_CONSOLE))
+		return;
+
+	do {
+		/* Get available FIFO space. */
+		if (avail == 0) {
+			if (is_rx)
+				avail = mlxbf_tmfifo_get_rx_avail(fifo);
+			else
+				avail = mlxbf_tmfifo_get_tx_avail(fifo, devid);
+			if (avail <= 0)
+				break;
+		}
+
+		/* Console output always comes from the Tx buffer. */
+		if (!is_rx && devid == VIRTIO_ID_CONSOLE) {
+			mlxbf_tmfifo_console_tx(fifo, avail);
+			break;
+		}
+
+		/* Handle one descriptor. */
+		more = mlxbf_tmfifo_rxtx_one_desc(vring, is_rx, &avail);
+	} while (more);
+}
+
+/* Handle Rx or Tx queues. */
+static void mlxbf_tmfifo_work_rxtx(struct mlxbf_tmfifo *fifo, int queue_id,
+				   int irq_id, bool is_rx)
+{
+	struct mlxbf_tmfifo_vdev *tm_vdev;
+	struct mlxbf_tmfifo_vring *vring;
+	int i;
+
+	if (!test_and_clear_bit(irq_id, &fifo->pend_events) ||
+	    !fifo->irq_info[irq_id].irq)
+		return;
+
+	for (i = 0; i < MLXBF_TMFIFO_VDEV_MAX; i++) {
+		tm_vdev = fifo->vdev[i];
+		if (tm_vdev) {
+			vring = &tm_vdev->vrings[queue_id];
+			if (vring->vq)
+				mlxbf_tmfifo_rxtx(vring, is_rx);
+		}
+	}
+}
+
+/* Work handler for Rx and Tx case. */
+static void mlxbf_tmfifo_work_handler(struct work_struct *work)
+{
+	struct mlxbf_tmfifo *fifo;
+
+	fifo = container_of(work, struct mlxbf_tmfifo, work);
+	if (!fifo->is_ready)
+		return;
+
+	mutex_lock(&fifo->lock);
+
+	/* Tx (Send data to the TmFifo). */
+	mlxbf_tmfifo_work_rxtx(fifo, MLXBF_TMFIFO_VRING_TX,
+			       MLXBF_TM_TX_LWM_IRQ, false);
+
+	/* Rx (Receive data from the TmFifo). */
+	mlxbf_tmfifo_work_rxtx(fifo, MLXBF_TMFIFO_VRING_RX,
+			       MLXBF_TM_RX_HWM_IRQ, true);
+
+	mutex_unlock(&fifo->lock);
+}
+
+/* The notify function is called when new buffers are posted. */
+static bool mlxbf_tmfifo_virtio_notify(struct virtqueue *vq)
+{
+	struct mlxbf_tmfifo_vring *vring = vq->priv;
+	struct mlxbf_tmfifo_vdev *tm_vdev;
+	struct mlxbf_tmfifo *fifo;
+	unsigned long flags;
+
+	fifo = vring->fifo;
+
+	/*
+	 * Virtio maintains vrings in pairs, even number ring for Rx
+	 * and odd number ring for Tx.
+	 */
+	if (!(vring->index & BIT(0))) {
+		if (test_and_set_bit(MLXBF_TM_RX_HWM_IRQ, &fifo->pend_events))
+			return true;
+	} else {
+		/*
+		 * Console could make blocking call with interrupts disabled.
+		 * In such case, the vring needs to be served right away. For
+		 * other cases, just set the TX LWM bit to start Tx in the
+		 * worker handler.
+		 */
+		if (vring->vdev_id == VIRTIO_ID_CONSOLE) {
+			spin_lock_irqsave(&fifo->spin_lock, flags);
+			tm_vdev = fifo->vdev[VIRTIO_ID_CONSOLE];
+			mlxbf_tmfifo_console_output(tm_vdev, vring);
+			spin_unlock_irqrestore(&fifo->spin_lock, flags);
+		} else if (test_and_set_bit(MLXBF_TM_TX_LWM_IRQ,
+					    &fifo->pend_events)) {
+			return true;
+		}
+	}
+
+	schedule_work(&fifo->work);
+
+	return true;
+}
+
+/* Get the array of feature bits for this device. */
+static u64 mlxbf_tmfifo_virtio_get_features(struct virtio_device *vdev)
+{
+	struct mlxbf_tmfifo_vdev *tm_vdev = mlxbf_vdev_to_tmfifo(vdev);
+
+	return tm_vdev->features;
+}
+
+/* Confirm device features to use. */
+static int mlxbf_tmfifo_virtio_finalize_features(struct virtio_device *vdev)
+{
+	struct mlxbf_tmfifo_vdev *tm_vdev = mlxbf_vdev_to_tmfifo(vdev);
+
+	tm_vdev->features = vdev->features;
+
+	return 0;
+}
+
+/* Free virtqueues found by find_vqs(). */
+static void mlxbf_tmfifo_virtio_del_vqs(struct virtio_device *vdev)
+{
+	struct mlxbf_tmfifo_vdev *tm_vdev = mlxbf_vdev_to_tmfifo(vdev);
+	struct mlxbf_tmfifo_vring *vring;
+	struct virtqueue *vq;
+	int i;
+
+	for (i = 0; i < ARRAY_SIZE(tm_vdev->vrings); i++) {
+		vring = &tm_vdev->vrings[i];
+
+		/* Release the pending packet. */
+		if (vring->desc)
+			mlxbf_tmfifo_release_pending_pkt(vring);
+		vq = vring->vq;
+		if (vq) {
+			vring->vq = NULL;
+			vring_del_virtqueue(vq);
+		}
+	}
+}
+
+/* Create and initialize the virtual queues. */
+static int mlxbf_tmfifo_virtio_find_vqs(struct virtio_device *vdev,
+					unsigned int nvqs,
+					struct virtqueue *vqs[],
+					vq_callback_t *callbacks[],
+					const char * const names[],
+					const bool *ctx,
+					struct irq_affinity *desc)
+{
+	struct mlxbf_tmfifo_vdev *tm_vdev = mlxbf_vdev_to_tmfifo(vdev);
+	struct mlxbf_tmfifo_vring *vring;
+	struct virtqueue *vq;
+	int i, ret, size;
+
+	if (nvqs > ARRAY_SIZE(tm_vdev->vrings))
+		return -EINVAL;
+
+	for (i = 0; i < nvqs; ++i) {
+		if (!names[i]) {
+			ret = -EINVAL;
+			goto error;
+		}
+		vring = &tm_vdev->vrings[i];
+
+		/* zero vring */
+		size = vring_size(vring->num, vring->align);
+		memset(vring->va, 0, size);
+		vq = vring_new_virtqueue(i, vring->num, vring->align, vdev,
+					 false, false, vring->va,
+					 mlxbf_tmfifo_virtio_notify,
+					 callbacks[i], names[i]);
+		if (!vq) {
+			dev_err(&vdev->dev, "vring_new_virtqueue failed\n");
+			ret = -ENOMEM;
+			goto error;
+		}
+
+		vqs[i] = vq;
+		vring->vq = vq;
+		vq->priv = vring;
+	}
+
+	return 0;
+
+error:
+	mlxbf_tmfifo_virtio_del_vqs(vdev);
+	return ret;
+}
+
+/* Read the status byte. */
+static u8 mlxbf_tmfifo_virtio_get_status(struct virtio_device *vdev)
+{
+	struct mlxbf_tmfifo_vdev *tm_vdev = mlxbf_vdev_to_tmfifo(vdev);
+
+	return tm_vdev->status;
+}
+
+/* Write the status byte. */
+static void mlxbf_tmfifo_virtio_set_status(struct virtio_device *vdev,
+					   u8 status)
+{
+	struct mlxbf_tmfifo_vdev *tm_vdev = mlxbf_vdev_to_tmfifo(vdev);
+
+	tm_vdev->status = status;
+}
+
+/* Reset the device. Not much here for now. */
+static void mlxbf_tmfifo_virtio_reset(struct virtio_device *vdev)
+{
+	struct mlxbf_tmfifo_vdev *tm_vdev = mlxbf_vdev_to_tmfifo(vdev);
+
+	tm_vdev->status = 0;
+}
+
+/* Read the value of a configuration field. */
+static void mlxbf_tmfifo_virtio_get(struct virtio_device *vdev,
+				    unsigned int offset,
+				    void *buf,
+				    unsigned int len)
+{
+	struct mlxbf_tmfifo_vdev *tm_vdev = mlxbf_vdev_to_tmfifo(vdev);
+
+	if (offset + len > sizeof(tm_vdev->config))
+		return;
+
+	memcpy(buf, (u8 *)&tm_vdev->config + offset, len);
+}
+
+/* Write the value of a configuration field. */
+static void mlxbf_tmfifo_virtio_set(struct virtio_device *vdev,
+				    unsigned int offset,
+				    const void *buf,
+				    unsigned int len)
+{
+	struct mlxbf_tmfifo_vdev *tm_vdev = mlxbf_vdev_to_tmfifo(vdev);
+
+	if (offset + len > sizeof(tm_vdev->config))
+		return;
+
+	memcpy((u8 *)&tm_vdev->config + offset, buf, len);
+}
+
+static void tmfifo_virtio_dev_release(struct device *device)
+{
+	struct virtio_device *vdev =
+			container_of(device, struct virtio_device, dev);
+	struct mlxbf_tmfifo_vdev *tm_vdev = mlxbf_vdev_to_tmfifo(vdev);
+
+	kfree(tm_vdev);
+}
+
+/* Virtio config operations. */
+static const struct virtio_config_ops mlxbf_tmfifo_virtio_config_ops = {
+	.get_features = mlxbf_tmfifo_virtio_get_features,
+	.finalize_features = mlxbf_tmfifo_virtio_finalize_features,
+	.find_vqs = mlxbf_tmfifo_virtio_find_vqs,
+	.del_vqs = mlxbf_tmfifo_virtio_del_vqs,
+	.reset = mlxbf_tmfifo_virtio_reset,
+	.set_status = mlxbf_tmfifo_virtio_set_status,
+	.get_status = mlxbf_tmfifo_virtio_get_status,
+	.get = mlxbf_tmfifo_virtio_get,
+	.set = mlxbf_tmfifo_virtio_set,
+};
+
+/* Create vdev for the FIFO. */
+static int mlxbf_tmfifo_create_vdev(struct device *dev,
+				    struct mlxbf_tmfifo *fifo,
+				    int vdev_id, u64 features,
+				    void *config, u32 size)
+{
+	struct mlxbf_tmfifo_vdev *tm_vdev, *reg_dev = NULL;
+	int ret;
+
+	mutex_lock(&fifo->lock);
+
+	tm_vdev = fifo->vdev[vdev_id];
+	if (tm_vdev) {
+		dev_err(dev, "vdev %d already exists\n", vdev_id);
+		ret = -EEXIST;
+		goto fail;
+	}
+
+	tm_vdev = kzalloc(sizeof(*tm_vdev), GFP_KERNEL);
+	if (!tm_vdev) {
+		ret = -ENOMEM;
+		goto fail;
+	}
+
+	tm_vdev->vdev.id.device = vdev_id;
+	tm_vdev->vdev.config = &mlxbf_tmfifo_virtio_config_ops;
+	tm_vdev->vdev.dev.parent = dev;
+	tm_vdev->vdev.dev.release = tmfifo_virtio_dev_release;
+	tm_vdev->features = features;
+	if (config)
+		memcpy(&tm_vdev->config, config, size);
+
+	if (mlxbf_tmfifo_alloc_vrings(fifo, tm_vdev)) {
+		dev_err(dev, "unable to allocate vring\n");
+		ret = -ENOMEM;
+		goto vdev_fail;
+	}
+
+	/* Allocate an output buffer for the console device. */
+	if (vdev_id == VIRTIO_ID_CONSOLE)
+		tm_vdev->tx_buf.buf = devm_kmalloc(dev,
+						   MLXBF_TMFIFO_CON_TX_BUF_SIZE,
+						   GFP_KERNEL);
+	fifo->vdev[vdev_id] = tm_vdev;
+
+	/* Register the virtio device. */
+	ret = register_virtio_device(&tm_vdev->vdev);
+	reg_dev = tm_vdev;
+	if (ret) {
+		dev_err(dev, "register_virtio_device failed\n");
+		goto vdev_fail;
+	}
+
+	mutex_unlock(&fifo->lock);
+	return 0;
+
+vdev_fail:
+	mlxbf_tmfifo_free_vrings(fifo, tm_vdev);
+	fifo->vdev[vdev_id] = NULL;
+	if (reg_dev)
+		put_device(&tm_vdev->vdev.dev);
+	else
+		kfree(tm_vdev);
+fail:
+	mutex_unlock(&fifo->lock);
+	return ret;
+}
+
+/* Delete vdev for the FIFO. */
+static int mlxbf_tmfifo_delete_vdev(struct mlxbf_tmfifo *fifo, int vdev_id)
+{
+	struct mlxbf_tmfifo_vdev *tm_vdev;
+
+	mutex_lock(&fifo->lock);
+
+	/* Unregister vdev. */
+	tm_vdev = fifo->vdev[vdev_id];
+	if (tm_vdev) {
+		unregister_virtio_device(&tm_vdev->vdev);
+		mlxbf_tmfifo_free_vrings(fifo, tm_vdev);
+		fifo->vdev[vdev_id] = NULL;
+	}
+
+	mutex_unlock(&fifo->lock);
+
+	return 0;
+}
+
+/* Read the configured network MAC address from efi variable. */
+static void mlxbf_tmfifo_get_cfg_mac(u8 *mac)
+{
+	efi_guid_t guid = EFI_GLOBAL_VARIABLE_GUID;
+	unsigned long size = ETH_ALEN;
+	efi_status_t status;
+	u8 buf[ETH_ALEN];
+
+	status = efi.get_variable(mlxbf_tmfifo_efi_name, &guid, NULL, &size,
+				  buf);
+	if (status == EFI_SUCCESS && size == ETH_ALEN)
+		ether_addr_copy(mac, buf);
+	else
+		ether_addr_copy(mac, mlxbf_tmfifo_net_default_mac);
+}
+
+/* Set TmFifo thresolds which is used to trigger interrupts. */
+static void mlxbf_tmfifo_set_threshold(struct mlxbf_tmfifo *fifo)
+{
+	u64 ctl;
+
+	/* Get Tx FIFO size and set the low/high watermark. */
+	ctl = readq(fifo->tx_base + MLXBF_TMFIFO_TX_CTL);
+	fifo->tx_fifo_size =
+		FIELD_GET(MLXBF_TMFIFO_TX_CTL__MAX_ENTRIES_MASK, ctl);
+	ctl = (ctl & ~MLXBF_TMFIFO_TX_CTL__LWM_MASK) |
+		FIELD_PREP(MLXBF_TMFIFO_TX_CTL__LWM_MASK,
+			   fifo->tx_fifo_size / 2);
+	ctl = (ctl & ~MLXBF_TMFIFO_TX_CTL__HWM_MASK) |
+		FIELD_PREP(MLXBF_TMFIFO_TX_CTL__HWM_MASK,
+			   fifo->tx_fifo_size - 1);
+	writeq(ctl, fifo->tx_base + MLXBF_TMFIFO_TX_CTL);
+
+	/* Get Rx FIFO size and set the low/high watermark. */
+	ctl = readq(fifo->rx_base + MLXBF_TMFIFO_RX_CTL);
+	fifo->rx_fifo_size =
+		FIELD_GET(MLXBF_TMFIFO_RX_CTL__MAX_ENTRIES_MASK, ctl);
+	ctl = (ctl & ~MLXBF_TMFIFO_RX_CTL__LWM_MASK) |
+		FIELD_PREP(MLXBF_TMFIFO_RX_CTL__LWM_MASK, 0);
+	ctl = (ctl & ~MLXBF_TMFIFO_RX_CTL__HWM_MASK) |
+		FIELD_PREP(MLXBF_TMFIFO_RX_CTL__HWM_MASK, 1);
+	writeq(ctl, fifo->rx_base + MLXBF_TMFIFO_RX_CTL);
+}
+
+static void mlxbf_tmfifo_cleanup(struct mlxbf_tmfifo *fifo)
+{
+	int i;
+
+	fifo->is_ready = false;
+	del_timer_sync(&fifo->timer);
+	mlxbf_tmfifo_disable_irqs(fifo);
+	cancel_work_sync(&fifo->work);
+	for (i = 0; i < MLXBF_TMFIFO_VDEV_MAX; i++)
+		mlxbf_tmfifo_delete_vdev(fifo, i);
+}
+
+/* Probe the TMFIFO. */
+static int mlxbf_tmfifo_probe(struct platform_device *pdev)
+{
+	struct virtio_net_config net_config;
+	struct device *dev = &pdev->dev;
+	struct mlxbf_tmfifo *fifo;
+	struct resource *res;
+	int i, rc;
+
+	fifo = devm_kzalloc(dev, sizeof(*fifo), GFP_KERNEL);
+	if (!fifo)
+		return -ENOMEM;
+
+	spin_lock_init(&fifo->spin_lock);
+	INIT_WORK(&fifo->work, mlxbf_tmfifo_work_handler);
+	mutex_init(&fifo->lock);
+
+	/* Get the resource of the Rx FIFO. */
+	res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
+	fifo->rx_base = devm_ioremap_resource(dev, res);
+	if (IS_ERR(fifo->rx_base))
+		return PTR_ERR(fifo->rx_base);
+
+	/* Get the resource of the Tx FIFO. */
+	res = platform_get_resource(pdev, IORESOURCE_MEM, 1);
+	fifo->tx_base = devm_ioremap_resource(dev, res);
+	if (IS_ERR(fifo->tx_base))
+		return PTR_ERR(fifo->tx_base);
+
+	platform_set_drvdata(pdev, fifo);
+
+	timer_setup(&fifo->timer, mlxbf_tmfifo_timer, 0);
+
+	for (i = 0; i < MLXBF_TM_MAX_IRQ; i++) {
+		fifo->irq_info[i].index = i;
+		fifo->irq_info[i].fifo = fifo;
+		fifo->irq_info[i].irq = platform_get_irq(pdev, i);
+		rc = devm_request_irq(dev, fifo->irq_info[i].irq,
+				      mlxbf_tmfifo_irq_handler, 0,
+				      "tmfifo", &fifo->irq_info[i]);
+		if (rc) {
+			dev_err(dev, "devm_request_irq failed\n");
+			fifo->irq_info[i].irq = 0;
+			return rc;
+		}
+	}
+
+	mlxbf_tmfifo_set_threshold(fifo);
+
+	/* Create the console vdev. */
+	rc = mlxbf_tmfifo_create_vdev(dev, fifo, VIRTIO_ID_CONSOLE, 0, NULL, 0);
+	if (rc)
+		goto fail;
+
+	/* Create the network vdev. */
+	memset(&net_config, 0, sizeof(net_config));
+	net_config.mtu = ETH_DATA_LEN;
+	net_config.status = VIRTIO_NET_S_LINK_UP;
+	mlxbf_tmfifo_get_cfg_mac(net_config.mac);
+	rc = mlxbf_tmfifo_create_vdev(dev, fifo, VIRTIO_ID_NET,
+				      MLXBF_TMFIFO_NET_FEATURES, &net_config,
+				      sizeof(net_config));
+	if (rc)
+		goto fail;
+
+	mod_timer(&fifo->timer, jiffies + MLXBF_TMFIFO_TIMER_INTERVAL);
+
+	fifo->is_ready = true;
+	return 0;
+
+fail:
+	mlxbf_tmfifo_cleanup(fifo);
+	return rc;
+}
+
+/* Device remove function. */
+static int mlxbf_tmfifo_remove(struct platform_device *pdev)
+{
+	struct mlxbf_tmfifo *fifo = platform_get_drvdata(pdev);
+
+	mlxbf_tmfifo_cleanup(fifo);
+
+	return 0;
+}
+
+static const struct acpi_device_id mlxbf_tmfifo_acpi_match[] = {
+	{ "MLNXBF01", 0 },
+	{}
+};
+MODULE_DEVICE_TABLE(acpi, mlxbf_tmfifo_acpi_match);
+
+static struct platform_driver mlxbf_tmfifo_driver = {
+	.probe = mlxbf_tmfifo_probe,
+	.remove = mlxbf_tmfifo_remove,
+	.driver = {
+		.name = "bf-tmfifo",
+		.acpi_match_table = mlxbf_tmfifo_acpi_match,
+	},
+};
+
+module_platform_driver(mlxbf_tmfifo_driver);
+
+MODULE_DESCRIPTION("Mellanox BlueField SoC TmFifo Driver");
+MODULE_LICENSE("GPL v2");
+MODULE_AUTHOR("Mellanox Technologies");
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [PATCH v13] platform/mellanox: Add TmFifo driver for Mellanox BlueField Soc
  2019-04-05 19:10     ` Liming Sun
@ 2019-04-07  2:05       ` Liming Sun
  2019-04-11 14:13         ` Andy Shevchenko
  0 siblings, 1 reply; 30+ messages in thread
From: Liming Sun @ 2019-04-07  2:05 UTC (permalink / raw)
  To: Andy Shevchenko
  Cc: David Woods, Andy Shevchenko, Darren Hart, Vadim Pasternak,
	Linux Kernel Mailing List, Platform Driver

Thanks Andy!  I just posted v14, which addresses all the comments you mentioned below for v13.

Regards,
Liming

> -----Original Message-----
> From: Andy Shevchenko <andy.shevchenko@gmail.com>
> Sent: Friday, April 5, 2019 11:44 AM
> To: Liming Sun <lsun@mellanox.com>
> Cc: David Woods <dwoods@mellanox.com>; Andy Shevchenko <andy@infradead.org>; Darren Hart <dvhart@infradead.org>; Vadim
> Pasternak <vadimp@mellanox.com>; Linux Kernel Mailing List <linux-kernel@vger.kernel.org>; Platform Driver <platform-driver-
> x86@vger.kernel.org>
> Subject: Re: [PATCH v13] platform/mellanox: Add TmFifo driver for Mellanox BlueField Soc
>
> On Thu, Apr 4, 2019 at 10:36 PM Liming Sun <lsun@mellanox.com> wrote:
> > This commit adds the TmFifo platform driver for Mellanox BlueField
> > Soc. TmFifo is a shared FIFO which enables external host machine
> > to exchange data with the SoC via USB or PCIe. The driver is based
> > on virtio framework and has console and network access enabled.
>
> Thanks for an update. Almost good.
> My comments below.
>
> Meanwhile I pushed this to my review and testing queue, thanks!
>
> > +#include <linux/acpi.h>
> > +#include <linux/bitfield.h>
> > +#include <linux/circ_buf.h>
> > +#include <linux/efi.h>
> > +#include <linux/irq.h>
> > +#include <linux/module.h>
> > +#include <linux/mutex.h>
> > +#include <linux/platform_device.h>
> > +#include <linux/types.h>
>
> Perhaps blank line here. Would be more clear that this is utilizing
> virtio framework.

Updated in v14.

>
> > +#include <linux/virtio_config.h>
> > +#include <linux/virtio_console.h>
> > +#include <linux/virtio_ids.h>
> > +#include <linux/virtio_net.h>
> > +#include <linux/virtio_ring.h>
>
> > +/**
> > + * mlxbf_tmfifo_msg_hdr - Structure of the TmFifo message header
> > + * @type: message type
> > + * @len: payload length
> > + * @u: 64-bit union data
> > + */
> > +union mlxbf_tmfifo_msg_hdr {
> > +       struct {
> > +               u8 type;
> > +               __be16 len;
> > +               u8 unused[5];
> > +       } __packed;
> > +       u64 data;
>
> I'm not sure I understand how you can distinguish which field of union to use?
> Isn't here some type missed?

Updated the comment in v14.

This message header is a union of struct and u64 data.
The 'struct' has
type and length field which are used to encode & decode the message. 
The 'data' field is used to read/write the message header from/to the FIFO.

>
> > +};
>
> > +static u8 mlxbf_tmfifo_net_default_mac[ETH_ALEN] = {
>
> > +       0x00, 0x1A, 0xCA, 0xFF, 0xFF, 0x01};
>
> This should be two lines.

Updated in v14.

>
> > +/* Supported virtio-net features. */
> > +#define MLXBF_TMFIFO_NET_FEATURES      (BIT_ULL(VIRTIO_NET_F_MTU) | \
> > +                                        BIT_ULL(VIRTIO_NET_F_STATUS) | \
> > +                                        BIT_ULL(VIRTIO_NET_F_MAC))
>
> Better to write as
>
> #define FOO \
> (BIT(x) | BIT(y) ...)
>
> I think I told this earlier?

Updated in v14.

>
> > +/* Allocate vrings for the fifo. */
>
> fifo -> FIFO (and check all occurrences)

Updated in v14.

>
> > +static int mlxbf_tmfifo_alloc_vrings(struct mlxbf_tmfifo *fifo,
> > +                                    struct mlxbf_tmfifo_vdev *tm_vdev)
> > +{
> > +       struct mlxbf_tmfifo_vring *vring;
> > +       struct device *dev;
> > +       dma_addr_t dma;
> > +       int i, size;
> > +       void *va;
> > +
> > +       for (i = 0; i < ARRAY_SIZE(tm_vdev->vrings); i++) {
> > +               vring = &tm_vdev->vrings[i];
> > +               vring->fifo = fifo;
> > +               vring->num = MLXBF_TMFIFO_VRING_SIZE;
> > +               vring->align = SMP_CACHE_BYTES;
> > +               vring->index = i;
> > +               vring->vdev_id = tm_vdev->vdev.id.device;
> > +               dev = &tm_vdev->vdev.dev;
> > +
> > +               size = vring_size(vring->num, vring->align);
> > +               va = dma_alloc_coherent(dev->parent, size, &dma, GFP_KERNEL);
> > +               if (!va) {
>
> > +                       dev_err(dev->parent, "dma_alloc_coherent failed\n");
>
> I don't see how this will free the allocated entries.
> I think I told about this either.

Updated in v14.
It's not a memory leak since the caller will release them
in case of failures. I added one line in this function to
call the mlxbf_tmfifo_free_vrings() to be more clear.

>
> > +                       return -ENOMEM;
> > +               }
> > +
> > +               vring->va = va;
> > +               vring->dma = dma;
> > +       }
> > +
> > +       return 0;
> > +}
>
> > +/* House-keeping timer. */
> > +static void mlxbf_tmfifo_timer(struct timer_list *arg)
> > +{
>
> > +       struct mlxbf_tmfifo *fifo = container_of(arg, struct mlxbf_tmfifo,
> > +                                                timer);
>
> One line would be still good enough.

Updated in v14.

>
> > +       int more;
> > +
> > +       more = !test_and_set_bit(MLXBF_TM_RX_HWM_IRQ, &fifo->pend_events) ||
> > +                   !test_and_set_bit(MLXBF_TM_TX_LWM_IRQ, &fifo->pend_events);
> > +
> > +       if (more)
> > +               schedule_work(&fifo->work);
> > +
> > +       mod_timer(&fifo->timer, jiffies + MLXBF_TMFIFO_TIMER_INTERVAL);
> > +}
>
> > +       status = efi.get_variable(mlxbf_tmfifo_efi_name, &guid, NULL, &size,
> > +                                 buf);
> > +       if (status == EFI_SUCCESS && size == ETH_ALEN)
> > +               ether_addr_copy(mac, buf);
> > +       else
>
> > +               memcpy(mac, mlxbf_tmfifo_net_default_mac, ETH_ALEN);
>
> ether_addr_copy() as well.

Updated in v14.

>
> > +}
>
> > +       fifo->pdev = pdev;
>
> Do you really need to keep pdev there? Isn't struct device pointer enough?

Not needed. Updated in v14. Thanks!

>
>
> > +       /* Create the console vdev. */
> > +       ret = mlxbf_tmfifo_create_vdev(&pdev->dev, fifo, VIRTIO_ID_CONSOLE, 0,
> > +                                      NULL, 0);
>
> If you define temporary variable
>   struct device *dev = &pdev->dev;
> these lines can be merged into one.

Yes, updated in v14.

>
> > +       if (ret)
> > +               goto fail;
>
> --
> With Best Regards,
> Andy Shevchenko

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v14] platform/mellanox: Add TmFifo driver for Mellanox BlueField Soc
  2019-04-07  2:03 ` [PATCH v14] " Liming Sun
@ 2019-04-11 14:09   ` Andy Shevchenko
  2019-04-12 14:23     ` Liming Sun
  0 siblings, 1 reply; 30+ messages in thread
From: Andy Shevchenko @ 2019-04-11 14:09 UTC (permalink / raw)
  To: Liming Sun
  Cc: David Woods, Andy Shevchenko, Darren Hart, Vadim Pasternak,
	Linux Kernel Mailing List, Platform Driver

On Sun, Apr 7, 2019 at 5:03 AM Liming Sun <lsun@mellanox.com> wrote:
>
> This commit adds the TmFifo platform driver for Mellanox BlueField
> Soc. TmFifo is a shared FIFO which enables external host machine
> to exchange data with the SoC via USB or PCIe. The driver is based
> on virtio framework and has console and network access enabled.

Thanks for an update, my comments below.

> +/**
> + * mlxbf_tmfifo_vdev - Structure of the TmFifo virtual device
> + * @vdev: virtio device, in which the vdev.id.device field has the
> + *        VIRTIO_ID_xxx id to distinguish the virtual device.
> + * @status: status of the device
> + * @features: supported features of the device
> + * @vrings: array of tmfifo vrings of this device
> + * @config.cons: virtual console config -
> + *               select if vdev.id.device is VIRTIO_ID_CONSOLE
> + * @config.net: virtual network config -
> + *              select if vdev.id.device is VIRTIO_ID_NET
> + * @tx_buf: tx buffer used to buffer data before writing into the FIFO
> + */
> +struct mlxbf_tmfifo_vdev {
> +       struct virtio_device vdev;
> +       u8 status;
> +       u64 features;
> +       struct mlxbf_tmfifo_vring vrings[MLXBF_TMFIFO_VRING_MAX];
> +       union {
> +               struct virtio_console_config cons;
> +               struct virtio_net_config net;
> +       } config;
> +       struct circ_buf tx_buf;
> +};

(1)

> +/**
> + * mlxbf_tmfifo_msg_hdr - Structure of the TmFifo message header
> + * @type: message type
> + * @len: payload length
> + * @data: 64-bit data used to write the message header into the TmFifo register.
> + *
> + * This message header is a union of struct and u64 data. The 'struct' has
> + * type and length field which are used to encode & decode the message. The
> + * 'data' field is used to read/write the message header from/to the FIFO.
> + */
> +union mlxbf_tmfifo_msg_hdr {
> +       struct {
> +               u8 type;
> +               __be16 len;
> +               u8 unused[5];
> +       } __packed;
> +       u64 data;
> +};

This union misses a type. See, for example, above structure (1) where
union is used correctly.

> +/* Allocate vrings for the FIFO. */
> +static int mlxbf_tmfifo_alloc_vrings(struct mlxbf_tmfifo *fifo,
> +                                    struct mlxbf_tmfifo_vdev *tm_vdev)
> +{
> +       struct mlxbf_tmfifo_vring *vring;
> +       struct device *dev;
> +       dma_addr_t dma;
> +       int i, size;
> +       void *va;
> +
> +       for (i = 0; i < ARRAY_SIZE(tm_vdev->vrings); i++) {
> +               vring = &tm_vdev->vrings[i];
> +               vring->fifo = fifo;
> +               vring->num = MLXBF_TMFIFO_VRING_SIZE;
> +               vring->align = SMP_CACHE_BYTES;
> +               vring->index = i;
> +               vring->vdev_id = tm_vdev->vdev.id.device;
> +               dev = &tm_vdev->vdev.dev;
> +
> +               size = vring_size(vring->num, vring->align);
> +               va = dma_alloc_coherent(dev->parent, size, &dma, GFP_KERNEL);
> +               if (!va) {

> +                       dev_err(dev->parent, "dma_alloc_coherent failed\n");
> +                       mlxbf_tmfifo_free_vrings(fifo, tm_vdev);

First do things, then report about what has been done.

> +                       return -ENOMEM;
> +               }
> +
> +               vring->va = va;
> +               vring->dma = dma;
> +       }
> +
> +       return 0;
> +}

> +/* Interrupt handler. */
> +static irqreturn_t mlxbf_tmfifo_irq_handler(int irq, void *arg)
> +{
> +       struct mlxbf_tmfifo_irq_info *irq_info = arg;
> +

> +       if (irq_info->index < MLXBF_TM_MAX_IRQ &&

On which circumstances this is possible?

> +           !test_and_set_bit(irq_info->index, &irq_info->fifo->pend_events))
> +               schedule_work(&irq_info->fifo->work);
> +
> +       return IRQ_HANDLED;
> +}

> +static void mlxbf_tmfifo_release_pending_pkt(struct mlxbf_tmfifo_vring *vring)
> +{
> +       struct vring_desc *desc_head;
> +       u32 len = 0;
> +
> +       if (vring->desc_head) {
> +               desc_head = vring->desc_head;
> +               len = vring->pkt_len;
> +       } else {
> +               desc_head = mlxbf_tmfifo_get_next_desc(vring);

> +               if (desc_head)

Redundant...

> +                       len = mlxbf_tmfifo_get_pkt_len(vring, desc_head);

...this is NULL-aware AFAICS.

> +       }
> +
> +       if (desc_head)
> +               mlxbf_tmfifo_release_desc(vring, desc_head, len);
> +
> +       vring->pkt_len = 0;
> +       vring->desc = NULL;
> +       vring->desc_head = NULL;
> +}

> +/* The notify function is called when new buffers are posted. */
> +static bool mlxbf_tmfifo_virtio_notify(struct virtqueue *vq)
> +{
> +       struct mlxbf_tmfifo_vring *vring = vq->priv;
> +       struct mlxbf_tmfifo_vdev *tm_vdev;
> +       struct mlxbf_tmfifo *fifo;
> +       unsigned long flags;
> +
> +       fifo = vring->fifo;
> +
> +       /*
> +        * Virtio maintains vrings in pairs, even number ring for Rx
> +        * and odd number ring for Tx.
> +        */

> +       if (!(vring->index & BIT(0))) {

Perhaps positive conditional is better.

> +               if (test_and_set_bit(MLXBF_TM_RX_HWM_IRQ, &fifo->pend_events))
> +                       return true;
> +       } else {
> +               /*
> +                * Console could make blocking call with interrupts disabled.
> +                * In such case, the vring needs to be served right away. For
> +                * other cases, just set the TX LWM bit to start Tx in the
> +                * worker handler.
> +                */
> +               if (vring->vdev_id == VIRTIO_ID_CONSOLE) {
> +                       spin_lock_irqsave(&fifo->spin_lock, flags);
> +                       tm_vdev = fifo->vdev[VIRTIO_ID_CONSOLE];
> +                       mlxbf_tmfifo_console_output(tm_vdev, vring);
> +                       spin_unlock_irqrestore(&fifo->spin_lock, flags);
> +               } else if (test_and_set_bit(MLXBF_TM_TX_LWM_IRQ,
> +                                           &fifo->pend_events)) {
> +                       return true;
> +               }
> +       }
> +
> +       schedule_work(&fifo->work);
> +
> +       return true;
> +}

> +/* Read the value of a configuration field. */
> +static void mlxbf_tmfifo_virtio_get(struct virtio_device *vdev,
> +                                   unsigned int offset,
> +                                   void *buf,
> +                                   unsigned int len)
> +{
> +       struct mlxbf_tmfifo_vdev *tm_vdev = mlxbf_vdev_to_tmfifo(vdev);
> +

> +       if (offset + len > sizeof(tm_vdev->config))
> +               return;

This doesn't protect against too big len and offset.
Same for other similar checks.

> +
> +       memcpy(buf, (u8 *)&tm_vdev->config + offset, len);
> +}

> +/* Read the configured network MAC address from efi variable. */
> +static void mlxbf_tmfifo_get_cfg_mac(u8 *mac)
> +{
> +       efi_guid_t guid = EFI_GLOBAL_VARIABLE_GUID;
> +       unsigned long size = ETH_ALEN;
> +       efi_status_t status;
> +       u8 buf[ETH_ALEN];
> +

> +       status = efi.get_variable(mlxbf_tmfifo_efi_name, &guid, NULL, &size,
> +                                 buf);

Use one line.

> +       if (status == EFI_SUCCESS && size == ETH_ALEN)
> +               ether_addr_copy(mac, buf);
> +       else
> +               ether_addr_copy(mac, mlxbf_tmfifo_net_default_mac);
> +}

> +/* Probe the TMFIFO. */
> +static int mlxbf_tmfifo_probe(struct platform_device *pdev)
> +{

> +       res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
> +       fifo->rx_base = devm_ioremap_resource(dev, res);

There is new helper devm_platform_ioremap_resource().
Please, use it instead.

> +       if (IS_ERR(fifo->rx_base))
> +               return PTR_ERR(fifo->rx_base);

> +       res = platform_get_resource(pdev, IORESOURCE_MEM, 1);
> +       fifo->tx_base = devm_ioremap_resource(dev, res);

Ditto.

> +       if (IS_ERR(fifo->tx_base))
> +               return PTR_ERR(fifo->tx_base);

> +}

-- 
With Best Regards,
Andy Shevchenko

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v13] platform/mellanox: Add TmFifo driver for Mellanox BlueField Soc
  2019-04-07  2:05       ` Liming Sun
@ 2019-04-11 14:13         ` Andy Shevchenko
  2019-04-12 16:15           ` Liming Sun
  0 siblings, 1 reply; 30+ messages in thread
From: Andy Shevchenko @ 2019-04-11 14:13 UTC (permalink / raw)
  To: Liming Sun
  Cc: David Woods, Andy Shevchenko, Darren Hart, Vadim Pasternak,
	Linux Kernel Mailing List, Platform Driver

On Sun, Apr 7, 2019 at 5:05 AM Liming Sun <lsun@mellanox.com> wrote:

> > > + * mlxbf_tmfifo_msg_hdr - Structure of the TmFifo message header
> > > + * @type: message type
> > > + * @len: payload length
> > > + * @u: 64-bit union data
> > > + */
> > > +union mlxbf_tmfifo_msg_hdr {
> > > +       struct {
> > > +               u8 type;
> > > +               __be16 len;
> > > +               u8 unused[5];
> > > +       } __packed;
> > > +       u64 data;
> >
> > I'm not sure I understand how you can distinguish which field of union to use?
> > Isn't here some type missed?
>
> Updated the comment in v14.
>
> This message header is a union of struct and u64 data.
> The 'struct' has
> type and length field which are used to encode & decode the message.
> The 'data' field is used to read/write the message header from/to the FIFO.

Something fishy here.

You are using a structure of data which you would like to write with
one call? Perhaps you need to construct this on-the-fly.
Moreover, the __be memeber is used in a data which is written as LE.
This needs more explanation.

-- 
With Best Regards,
Andy Shevchenko

^ permalink raw reply	[flat|nested] 30+ messages in thread

* RE: [PATCH v14] platform/mellanox: Add TmFifo driver for Mellanox BlueField Soc
  2019-04-11 14:09   ` Andy Shevchenko
@ 2019-04-12 14:23     ` Liming Sun
  0 siblings, 0 replies; 30+ messages in thread
From: Liming Sun @ 2019-04-12 14:23 UTC (permalink / raw)
  To: Andy Shevchenko
  Cc: David Woods, Andy Shevchenko, Darren Hart, Vadim Pasternak,
	Linux Kernel Mailing List, Platform Driver

Thanks Andy! I'll try to post v15 to address these comments this weekend.
(Please also see responses to each comments below).

> -----Original Message-----
> From: Andy Shevchenko <andy.shevchenko@gmail.com>
> Sent: Thursday, April 11, 2019 10:10 AM
> To: Liming Sun <lsun@mellanox.com>
> Cc: David Woods <dwoods@mellanox.com>; Andy Shevchenko <andy@infradead.org>; Darren Hart <dvhart@infradead.org>; Vadim
> Pasternak <vadimp@mellanox.com>; Linux Kernel Mailing List <linux-kernel@vger.kernel.org>; Platform Driver <platform-driver-
> x86@vger.kernel.org>
> Subject: Re: [PATCH v14] platform/mellanox: Add TmFifo driver for Mellanox BlueField Soc
> 
> On Sun, Apr 7, 2019 at 5:03 AM Liming Sun <lsun@mellanox.com> wrote:
> >
> > This commit adds the TmFifo platform driver for Mellanox BlueField
> > Soc. TmFifo is a shared FIFO which enables external host machine
> > to exchange data with the SoC via USB or PCIe. The driver is based
> > on virtio framework and has console and network access enabled.
> 
> Thanks for an update, my comments below.

Thanks for the comments!

> 
> > +/**
> > + * mlxbf_tmfifo_vdev - Structure of the TmFifo virtual device
> > + * @vdev: virtio device, in which the vdev.id.device field has the
> > + *        VIRTIO_ID_xxx id to distinguish the virtual device.
> > + * @status: status of the device
> > + * @features: supported features of the device
> > + * @vrings: array of tmfifo vrings of this device
> > + * @config.cons: virtual console config -
> > + *               select if vdev.id.device is VIRTIO_ID_CONSOLE
> > + * @config.net: virtual network config -
> > + *              select if vdev.id.device is VIRTIO_ID_NET
> > + * @tx_buf: tx buffer used to buffer data before writing into the FIFO
> > + */
> > +struct mlxbf_tmfifo_vdev {
> > +       struct virtio_device vdev;
> > +       u8 status;
> > +       u64 features;
> > +       struct mlxbf_tmfifo_vring vrings[MLXBF_TMFIFO_VRING_MAX];
> > +       union {
> > +               struct virtio_console_config cons;
> > +               struct virtio_net_config net;
> > +       } config;
> > +       struct circ_buf tx_buf;
> > +};
> 
> (1)
> 
> > +/**
> > + * mlxbf_tmfifo_msg_hdr - Structure of the TmFifo message header
> > + * @type: message type
> > + * @len: payload length
> > + * @data: 64-bit data used to write the message header into the TmFifo register.
> > + *
> > + * This message header is a union of struct and u64 data. The 'struct' has
> > + * type and length field which are used to encode & decode the message. The
> > + * 'data' field is used to read/write the message header from/to the FIFO.
> > + */
> > +union mlxbf_tmfifo_msg_hdr {
> > +       struct {
> > +               u8 type;
> > +               __be16 len;
> > +               u8 unused[5];
> > +       } __packed;
> > +       u64 data;
> > +};
> 
> This union misses a type. See, for example, above structure (1) where
> union is used correctly.

This union seems causing confusion. I'll try to remove the union in v15 
and "construct this on-the-fly" just like you mentioned in another email. 
So instead of " writeq(hdr.data, ...)" we could simply do 
"writeq(*(u64 *)&hdr, ...)", thus no need for a union.

> 
> > +/* Allocate vrings for the FIFO. */
> > +static int mlxbf_tmfifo_alloc_vrings(struct mlxbf_tmfifo *fifo,
> > +                                    struct mlxbf_tmfifo_vdev *tm_vdev)
> > +{
> > +       struct mlxbf_tmfifo_vring *vring;
> > +       struct device *dev;
> > +       dma_addr_t dma;
> > +       int i, size;
> > +       void *va;
> > +
> > +       for (i = 0; i < ARRAY_SIZE(tm_vdev->vrings); i++) {
> > +               vring = &tm_vdev->vrings[i];
> > +               vring->fifo = fifo;
> > +               vring->num = MLXBF_TMFIFO_VRING_SIZE;
> > +               vring->align = SMP_CACHE_BYTES;
> > +               vring->index = i;
> > +               vring->vdev_id = tm_vdev->vdev.id.device;
> > +               dev = &tm_vdev->vdev.dev;
> > +
> > +               size = vring_size(vring->num, vring->align);
> > +               va = dma_alloc_coherent(dev->parent, size, &dma, GFP_KERNEL);
> > +               if (!va) {
> 
> > +                       dev_err(dev->parent, "dma_alloc_coherent failed\n");
> > +                       mlxbf_tmfifo_free_vrings(fifo, tm_vdev);
> 
> First do things, then report about what has been done.

Will update it in v15.

> 
> > +                       return -ENOMEM;
> > +               }
> > +
> > +               vring->va = va;
> > +               vring->dma = dma;
> > +       }
> > +
> > +       return 0;
> > +}
> 
> > +/* Interrupt handler. */
> > +static irqreturn_t mlxbf_tmfifo_irq_handler(int irq, void *arg)
> > +{
> > +       struct mlxbf_tmfifo_irq_info *irq_info = arg;
> > +
> 
> > +       if (irq_info->index < MLXBF_TM_MAX_IRQ &&
> 
> On which circumstances this is possible?

Yes, Not needed at all. Will update it in v15.

> 
> > +           !test_and_set_bit(irq_info->index, &irq_info->fifo->pend_events))
> > +               schedule_work(&irq_info->fifo->work);
> > +
> > +       return IRQ_HANDLED;
> > +}
> 
> > +static void mlxbf_tmfifo_release_pending_pkt(struct mlxbf_tmfifo_vring *vring)
> > +{
> > +       struct vring_desc *desc_head;
> > +       u32 len = 0;
> > +
> > +       if (vring->desc_head) {
> > +               desc_head = vring->desc_head;
> > +               len = vring->pkt_len;
> > +       } else {
> > +               desc_head = mlxbf_tmfifo_get_next_desc(vring);
> 
> > +               if (desc_head)
> 
> Redundant...

Will update it in v15.

> 
> > +                       len = mlxbf_tmfifo_get_pkt_len(vring, desc_head);
> 
> ...this is NULL-aware AFAICS.

Yes, it is.

> 
> > +       }
> > +
> > +       if (desc_head)
> > +               mlxbf_tmfifo_release_desc(vring, desc_head, len);
> > +
> > +       vring->pkt_len = 0;
> > +       vring->desc = NULL;
> > +       vring->desc_head = NULL;
> > +}
> 
> > +/* The notify function is called when new buffers are posted. */
> > +static bool mlxbf_tmfifo_virtio_notify(struct virtqueue *vq)
> > +{
> > +       struct mlxbf_tmfifo_vring *vring = vq->priv;
> > +       struct mlxbf_tmfifo_vdev *tm_vdev;
> > +       struct mlxbf_tmfifo *fifo;
> > +       unsigned long flags;
> > +
> > +       fifo = vring->fifo;
> > +
> > +       /*
> > +        * Virtio maintains vrings in pairs, even number ring for Rx
> > +        * and odd number ring for Tx.
> > +        */
> 
> > +       if (!(vring->index & BIT(0))) {
> 
> Perhaps positive conditional is better.

Will update it in v15.

> 
> > +               if (test_and_set_bit(MLXBF_TM_RX_HWM_IRQ, &fifo->pend_events))
> > +                       return true;
> > +       } else {
> > +               /*
> > +                * Console could make blocking call with interrupts disabled.
> > +                * In such case, the vring needs to be served right away. For
> > +                * other cases, just set the TX LWM bit to start Tx in the
> > +                * worker handler.
> > +                */
> > +               if (vring->vdev_id == VIRTIO_ID_CONSOLE) {
> > +                       spin_lock_irqsave(&fifo->spin_lock, flags);
> > +                       tm_vdev = fifo->vdev[VIRTIO_ID_CONSOLE];
> > +                       mlxbf_tmfifo_console_output(tm_vdev, vring);
> > +                       spin_unlock_irqrestore(&fifo->spin_lock, flags);
> > +               } else if (test_and_set_bit(MLXBF_TM_TX_LWM_IRQ,
> > +                                           &fifo->pend_events)) {
> > +                       return true;
> > +               }
> > +       }
> > +
> > +       schedule_work(&fifo->work);
> > +
> > +       return true;
> > +}
> 
> > +/* Read the value of a configuration field. */
> > +static void mlxbf_tmfifo_virtio_get(struct virtio_device *vdev,
> > +                                   unsigned int offset,
> > +                                   void *buf,
> > +                                   unsigned int len)
> > +{
> > +       struct mlxbf_tmfifo_vdev *tm_vdev = mlxbf_vdev_to_tmfifo(vdev);
> > +
> 
> > +       if (offset + len > sizeof(tm_vdev->config))
> > +               return;
> 
> This doesn't protect against too big len and offset.
> Same for other similar checks.

Will revise it in v15 like "if ((u64)offset + len > sizeof(tm_vdev->config))"

> 
> > +
> > +       memcpy(buf, (u8 *)&tm_vdev->config + offset, len);
> > +}
> 
> > +/* Read the configured network MAC address from efi variable. */
> > +static void mlxbf_tmfifo_get_cfg_mac(u8 *mac)
> > +{
> > +       efi_guid_t guid = EFI_GLOBAL_VARIABLE_GUID;
> > +       unsigned long size = ETH_ALEN;
> > +       efi_status_t status;
> > +       u8 buf[ETH_ALEN];
> > +
> 
> > +       status = efi.get_variable(mlxbf_tmfifo_efi_name, &guid, NULL, &size,
> > +                                 buf);
> 
> Use one line.

Will update it in v15.

> 
> > +       if (status == EFI_SUCCESS && size == ETH_ALEN)
> > +               ether_addr_copy(mac, buf);
> > +       else
> > +               ether_addr_copy(mac, mlxbf_tmfifo_net_default_mac);
> > +}
> 
> > +/* Probe the TMFIFO. */
> > +static int mlxbf_tmfifo_probe(struct platform_device *pdev)
> > +{
> 
> > +       res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
> > +       fifo->rx_base = devm_ioremap_resource(dev, res);
> 
> There is new helper devm_platform_ioremap_resource().
> Please, use it instead.

Will update it in v15.

> 
> > +       if (IS_ERR(fifo->rx_base))
> > +               return PTR_ERR(fifo->rx_base);
> 
> > +       res = platform_get_resource(pdev, IORESOURCE_MEM, 1);
> > +       fifo->tx_base = devm_ioremap_resource(dev, res);
> 
> Ditto.

Will update it in v15.

> 
> > +       if (IS_ERR(fifo->tx_base))
> > +               return PTR_ERR(fifo->tx_base);
> 
> > +}
> 
> --
> With Best Regards,
> Andy Shevchenko

^ permalink raw reply	[flat|nested] 30+ messages in thread

* RE: [PATCH v13] platform/mellanox: Add TmFifo driver for Mellanox BlueField Soc
  2019-04-11 14:13         ` Andy Shevchenko
@ 2019-04-12 16:15           ` Liming Sun
  0 siblings, 0 replies; 30+ messages in thread
From: Liming Sun @ 2019-04-12 16:15 UTC (permalink / raw)
  To: Andy Shevchenko
  Cc: David Woods, Andy Shevchenko, Darren Hart, Vadim Pasternak,
	Linux Kernel Mailing List, Platform Driver

Thanks Andy. Please see my response below for this email as well.

- Liming

> -----Original Message-----
> From: Andy Shevchenko <andy.shevchenko@gmail.com>
> Sent: Thursday, April 11, 2019 10:13 AM
> To: Liming Sun <lsun@mellanox.com>
> Cc: David Woods <dwoods@mellanox.com>; Andy Shevchenko <andy@infradead.org>; Darren Hart <dvhart@infradead.org>; Vadim
> Pasternak <vadimp@mellanox.com>; Linux Kernel Mailing List <linux-kernel@vger.kernel.org>; Platform Driver <platform-driver-
> x86@vger.kernel.org>
> Subject: Re: [PATCH v13] platform/mellanox: Add TmFifo driver for Mellanox BlueField Soc
> 
> On Sun, Apr 7, 2019 at 5:05 AM Liming Sun <lsun@mellanox.com> wrote:
> 
> > > > + * mlxbf_tmfifo_msg_hdr - Structure of the TmFifo message header
> > > > + * @type: message type
> > > > + * @len: payload length
> > > > + * @u: 64-bit union data
> > > > + */
> > > > +union mlxbf_tmfifo_msg_hdr {
> > > > +       struct {
> > > > +               u8 type;
> > > > +               __be16 len;
> > > > +               u8 unused[5];
> > > > +       } __packed;
> > > > +       u64 data;
> > >
> > > I'm not sure I understand how you can distinguish which field of union to use?
> > > Isn't here some type missed?
> >
> > Updated the comment in v14.
> >
> > This message header is a union of struct and u64 data.
> > The 'struct' has
> > type and length field which are used to encode & decode the message.
> > The 'data' field is used to read/write the message header from/to the FIFO.
> 
> Something fishy here.
> 
> You are using a structure of data which you would like to write with
> one call? Perhaps you need to construct this on-the-fly.

Looks like "union causes confusion".
I will update it in v15 to construct it on-the-fly as suggested.

> Moreover, the __be memeber is used in a data which is written as LE.
> This needs more explanation.

Will update the comment for it in v15.  Below are some explanation for it.

The 'LE' is for the low-level mmio transport layer. The SoC sends data stream
into the FIFO, the other side read it. The byte-order of the data stream keeps
the same when the other side reads it. The "__be16" is for the driver or 
application on both sides to agree on how to decode the 'length'.

For example, the SoC side (little endian) sends a message with
8-byte message header "01 02 03 04 05 06 07 08" into the FIFO. The other
side (assuming big endian host machine using USB bulk transfer) reads the
same byte-stream and try to decode it with the mlxbf_tmfifo_msg_hdr.
Without the "__be16" conversion, the SoC side will think 
"type=1, length=0x0302" while the big endian host-side will think 
"type=1, length=0x0203".

> 
> --
> With Best Regards,
> Andy Shevchenko

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH v15] platform/mellanox: Add TmFifo driver for Mellanox BlueField Soc
       [not found] <b143b40446c1870fb8d422b364ead95d54552be9.1527264077.git.lsun@mellanox.com>
                   ` (9 preceding siblings ...)
  2019-04-07  2:03 ` [PATCH v14] " Liming Sun
@ 2019-04-12 17:30 ` Liming Sun
  2019-05-03 13:49 ` [PATCH v16] " Liming Sun
  11 siblings, 0 replies; 30+ messages in thread
From: Liming Sun @ 2019-04-12 17:30 UTC (permalink / raw)
  To: David Woods, Andy Shevchenko, Darren Hart, Vadim Pasternak
  Cc: Liming Sun, linux-kernel, platform-driver-x86

This commit adds the TmFifo platform driver for Mellanox BlueField
Soc. TmFifo is a shared FIFO which enables external host machine
to exchange data with the SoC via USB or PCIe. The driver is based
on virtio framework and has console and network access enabled.

Reviewed-by: Vadim Pasternak <vadimp@mellanox.com>
Signed-off-by: Liming Sun <lsun@mellanox.com>
---
v14->v15:
    Fixes for comments from Andy:
    - Remove the 'union' definition of mlxbf_tmfifo_msg_hdr and use
      on-the-fly conversion when sending the 8-byte message header
      into the FIFO;
    - Update comment of mlxbf_tmfifo_msg_hdr explaining why '__be16'
      is needed for the 'len' field. The SoC sends data stream into
      the FIFO and the other side reads it. The byte order of the data
      stream (byte-stream) stays the same. The 'len' field is encoded
      into network byte order so external host machine with different
      endianness could decode it. The implementation has been verified
      over USB with an external PPC host machine running in big-endian
      mode.
    - Move the 'dev_err()' line to the end of the block in function
      mlxbf_tmfifo_alloc_vrings();
    - Remove the 'irq_info->index < MLXBF_TM_MAX_IRQ' check in
      mlxbf_tmfifo_irq_handler() since it's unnecessary;
    - Remove the 'if (desc_head)' check in
      mlxbf_tmfifo_release_pending_pkt() since function
      mlxbf_tmfifo_get_pkt_len() is already NULL-aware;
    - Adjust the testing order of 'if (!(vring->index & BIT(0)))'
      in bool mlxbf_tmfifo_virtio_notify() to test the positive case
      'if (vring->index & BIT(0))' first;
    - Add '(u64)offset' conversion in mlxbf_tmfifo_virtio_get() to
      avoid 32-bit length addition overflow;
    - Update the 'efi.get_variable' statement into single line in
      mlxbf_tmfifo_get_cfg_mac();
    - Use new helper devm_platform_ioremap_resource() to replace
      'platform_get_resource() + devm_ioremap_resource()' in
      mlxbf_tmfifo_probe();
v13->v14:
    Fixes for comments from Andy:
    - Add a blank line to separate the virtio header files;
    - Update the comment for 'union mlxbf_tmfifo_msg_hdr' to be
      more clear how this union is used;
    - Update the 'mlxbf_tmfifo_net_default_mac[ETH_ALEN]' definition
      to be two lines;
    - Reformat macro MLXBF_TMFIFO_NET_FEATURES to put the definition
      in a seperate line;
    - Update all 'fifo' to 'FIFO' in the comments;
    - Update mlxbf_tmfifo_alloc_vrings() to specifically release the
      allocated entries in case of failures, so the logic looks more
      clear. In the caller function the mlxbf_tmfifo_free_vrings()
      might be called again in case of other failures, which is ok
      since the 'va' pointer will be set to NULL once released;
    - Update mlxbf_tmfifo_timer() to change the first statement to
      one line;
    - Update one memcpy() to ether_addr_copy() in
      mlxbf_tmfifo_get_cfg_mac();
    - Remove 'fifo->pdev' since it is really not needed;
    - Define temporary variable to update the mlxbf_tmfifo_create_vdev()
      statement into single line.
    New changes by Liming:
    - Reorder the logic a little bit in mlxbf_tmfifo_timer(). Previously
      it has logic like "!a || !b" while the '!b' will not be evaluated
      if '!a' is true. It was changed to this way during review, but is
      actually not the desired behavior since both bits need to be
      tested/set in fifo->pend_events. This issue was found during
      verification which caused extra delays for Tx packets.
v12->v13:
    Rebase and resubmit (no new changes).
v11->v12:
    Fixed the two unsolved comments from v11.
    - "Change macro mlxbf_vdev_to_tmfifo() to one line"
      Done. Seems not hard.
    - "Is it appropriate use of devm_* for 'tm_vdev = devm_kzalloc'"
      Yes, understand the comment now. The tmfifo is fixed, but the
      vdev is dynamic. Use kzalloc() instead, and free the device
      in the release callback which is the right place for it.
v10->v11:
    Fixes for comments from Andy:
    - Use GENMASK_ULL() instead of GENMASK() in mlxbf-tmfifo-regs.h
    - Removed the cpu_to_le64()/le64_to_cpu() conversion since
      readq()/writeq() already takes care of it.
    - Remove the "if (irq)" check in mlxbf_tmfifo_disable_irqs().
    - Add "u32 count" temp variable in mlxbf_tmfifo_get_tx_avail().
    - Clean up mlxbf_tmfifo_get_cfg_mac(), use ETH_ALEN instead of
      value 6.
    - Change the tx_buf to use Linux existing 'struct circ_buf'.
    Comment not applied:
    - "Change macro mlxbf_vdev_to_tmfifo() to one line"
      Couldn't fit in one line with 80 chracters
    - "Is it appropriate use of devm_* for 'tm_vdev = devm_kzalloc'"
      This is SoC, the device won't be closed or detached.
      The only case is when the driver is unloaded. So it appears
      ok to use devm_kzalloc() since it's allocated during probe()
      and released during module unload.
    Comments from Vadim: OK
v9->v10:
    Fixes for comments from Andy:
    - Use devm_ioremap_resource() instead of devm_ioremap().
    - Use kernel-doc comments.
    - Keep Makefile contents sorted.
    - Use same fixed format for offsets.
    - Use SZ_1K/SZ_32K instead of 1024/23*1024.
    - Remove unnecessary comments.
    - Use one style for max numbers.
    - More comments for mlxbf_tmfifo_vdev and mlxbf_tmfifo_data_64bit.
    - Use globally defined MTU instead of new definition.
    - Remove forward declaration of mlxbf_tmfifo_remove().
    - Remove PAGE_ALIGN() for dma_alloc_coherent)().
    - Remove the cast of "struct vring *".
    - Check return result of test_and_set_bit().
    - Add a macro mlxbt_vdev_to_tmfifo().
    - Several other minor coding style comments.
    Comment not applied:
    - "Shouldn't be rather helper in EFI lib in kernel"
      Looks like efi.get_variable() is the way I found in the kernel
      tree.
    - "this one is not protected anyhow? Potential race condition"
      In mlxbf_tmfifo_console_tx(), the spin-lock is used to protect the
      'tx_buf' only, not the FIFO writes. So there is no race condition.
    - "Is __packed needed in mlxbf_tmfifo_msg_hdr".
      Yes, it is needed to make sure the structure is 8 bytes.
    Fixes for comments from Vadim:
    - Use tab in mlxbf-tmfifo-regs.h
    - Use kernel-doc comments for struct mlxbf_tmfifo_msg_hdr and
      mlxbf_tmfifo_irq_info as well.
    - Use _MAX instead of _CNT in the macro definition to be consistent.
    - Fix the MODULE_LICENSE.
    - Use BIT_ULL() instead of BIT().
    - Remove argument of 'avail' for mlxbf_tmfifo_rxtx_header() and
      mlxbf_tmfifo_rxtx_word()
    - Revise logic in mlxbf_tmfifo_rxtx_one_desc() to remove the
      WARN_ON().
    - Change "union mlxbf_tmfifo_u64 u" to "union mlxbf_tmfifo_u64 buf"
      in mlxbf_tmfifo_rxtx_word().
    - Change date type of vring_change from 'int' to 'bool'.
    - Remove the blank lines after Signed-off.
    - Don’t use declaration in the middle.
    - Make the network header initialization in some more elegant way.
    - Change label done to mlxbf_tmfifo_desc_done.
    - Remove some unnecessary comments, and several other misc coding
      style comments.
    - Simplify code logic in mlxbf_tmfifo_virtio_notify()
    New changes by Liming:
    - Simplify the Rx/Tx function arguments to make it more readable.
v8->v9:
    Fixes for comments from Andy:
    - Use modern devm_xxx() API instead.
    Fixes for comments from Vadim:
    - Split the Rx/Tx function into smaller funcitons.
    - File name, copyright information.
    - Function and variable name conversion.
    - Local variable and indent coding styles.
    - Remove unnecessary 'inline' declarations.
    - Use devm_xxx() APIs.
    - Move the efi_char16_t MAC address definition to global.
    - Fix warnings reported by 'checkpatch --strict'.
    - Fix warnings reported by 'make CF="-D__CHECK_ENDIAN__"'.
    - Change select VIRTIO_xxx to depends on  VIRTIO_ in Kconfig.
    - Merge mlxbf_tmfifo_vdev_tx_buf_push() and
      mlxbf_tmfifo_vdev_tx_buf_pop().
    - Add union to avoid casting between __le64 and u64.
    - Several other misc coding style comments.
    New changes by Liming:
    - Removed the DT binding documentation since only ACPI is
      supported for now by UEFI on the SoC.
v8: Re-submit under drivers/platform/mellanox for the target-side
    platform driver only.
v7: Added host side drivers into the same patch set.
v5~v6: Coding style fix.
v1~v4: Initial version for directory drivers/soc/mellanox.
---
 drivers/platform/mellanox/Kconfig             |   12 +-
 drivers/platform/mellanox/Makefile            |    1 +
 drivers/platform/mellanox/mlxbf-tmfifo-regs.h |   63 ++
 drivers/platform/mellanox/mlxbf-tmfifo.c      | 1281 +++++++++++++++++++++++++
 4 files changed, 1356 insertions(+), 1 deletion(-)
 create mode 100644 drivers/platform/mellanox/mlxbf-tmfifo-regs.h
 create mode 100644 drivers/platform/mellanox/mlxbf-tmfifo.c

diff --git a/drivers/platform/mellanox/Kconfig b/drivers/platform/mellanox/Kconfig
index cd8a908..530fe7e 100644
--- a/drivers/platform/mellanox/Kconfig
+++ b/drivers/platform/mellanox/Kconfig
@@ -5,7 +5,7 @@
 
 menuconfig MELLANOX_PLATFORM
 	bool "Platform support for Mellanox hardware"
-	depends on X86 || ARM || COMPILE_TEST
+	depends on X86 || ARM || ARM64 || COMPILE_TEST
 	---help---
 	  Say Y here to get to see options for platform support for
 	  Mellanox systems. This option alone does not add any kernel code.
@@ -34,4 +34,14 @@ config MLXREG_IO
 	  to system resets operation, system reset causes monitoring and some
 	  kinds of mux selection.
 
+config MLXBF_TMFIFO
+	tristate "Mellanox BlueField SoC TmFifo platform driver"
+	depends on ARM64
+	depends on ACPI
+	depends on VIRTIO_CONSOLE && VIRTIO_NET
+	help
+	  Say y here to enable TmFifo support. The TmFifo driver provides
+          platform driver support for the TmFifo which supports console
+          and networking based on the virtio framework.
+
 endif # MELLANOX_PLATFORM
diff --git a/drivers/platform/mellanox/Makefile b/drivers/platform/mellanox/Makefile
index 57074d9c..a229bda1 100644
--- a/drivers/platform/mellanox/Makefile
+++ b/drivers/platform/mellanox/Makefile
@@ -3,5 +3,6 @@
 # Makefile for linux/drivers/platform/mellanox
 # Mellanox Platform-Specific Drivers
 #
+obj-$(CONFIG_MLXBF_TMFIFO)	+= mlxbf-tmfifo.o
 obj-$(CONFIG_MLXREG_HOTPLUG)	+= mlxreg-hotplug.o
 obj-$(CONFIG_MLXREG_IO) += mlxreg-io.o
diff --git a/drivers/platform/mellanox/mlxbf-tmfifo-regs.h b/drivers/platform/mellanox/mlxbf-tmfifo-regs.h
new file mode 100644
index 0000000..e4f0d2e
--- /dev/null
+++ b/drivers/platform/mellanox/mlxbf-tmfifo-regs.h
@@ -0,0 +1,63 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Copyright (c) 2019, Mellanox Technologies. All rights reserved.
+ */
+
+#ifndef __MLXBF_TMFIFO_REGS_H__
+#define __MLXBF_TMFIFO_REGS_H__
+
+#include <linux/types.h>
+#include <linux/bits.h>
+
+#define MLXBF_TMFIFO_TX_DATA				0x00
+#define MLXBF_TMFIFO_TX_STS				0x08
+#define MLXBF_TMFIFO_TX_STS__LENGTH			0x0001
+#define MLXBF_TMFIFO_TX_STS__COUNT_SHIFT		0
+#define MLXBF_TMFIFO_TX_STS__COUNT_WIDTH		9
+#define MLXBF_TMFIFO_TX_STS__COUNT_RESET_VAL		0
+#define MLXBF_TMFIFO_TX_STS__COUNT_RMASK		GENMASK_ULL(8, 0)
+#define MLXBF_TMFIFO_TX_STS__COUNT_MASK			GENMASK_ULL(8, 0)
+#define MLXBF_TMFIFO_TX_CTL				0x10
+#define MLXBF_TMFIFO_TX_CTL__LENGTH			0x0001
+#define MLXBF_TMFIFO_TX_CTL__LWM_SHIFT			0
+#define MLXBF_TMFIFO_TX_CTL__LWM_WIDTH			8
+#define MLXBF_TMFIFO_TX_CTL__LWM_RESET_VAL		128
+#define MLXBF_TMFIFO_TX_CTL__LWM_RMASK			GENMASK_ULL(7, 0)
+#define MLXBF_TMFIFO_TX_CTL__LWM_MASK			GENMASK_ULL(7, 0)
+#define MLXBF_TMFIFO_TX_CTL__HWM_SHIFT			8
+#define MLXBF_TMFIFO_TX_CTL__HWM_WIDTH			8
+#define MLXBF_TMFIFO_TX_CTL__HWM_RESET_VAL		128
+#define MLXBF_TMFIFO_TX_CTL__HWM_RMASK			GENMASK_ULL(7, 0)
+#define MLXBF_TMFIFO_TX_CTL__HWM_MASK			GENMASK_ULL(15, 8)
+#define MLXBF_TMFIFO_TX_CTL__MAX_ENTRIES_SHIFT		32
+#define MLXBF_TMFIFO_TX_CTL__MAX_ENTRIES_WIDTH		9
+#define MLXBF_TMFIFO_TX_CTL__MAX_ENTRIES_RESET_VAL	256
+#define MLXBF_TMFIFO_TX_CTL__MAX_ENTRIES_RMASK		GENMASK_ULL(8, 0)
+#define MLXBF_TMFIFO_TX_CTL__MAX_ENTRIES_MASK		GENMASK_ULL(40, 32)
+#define MLXBF_TMFIFO_RX_DATA				0x00
+#define MLXBF_TMFIFO_RX_STS				0x08
+#define MLXBF_TMFIFO_RX_STS__LENGTH			0x0001
+#define MLXBF_TMFIFO_RX_STS__COUNT_SHIFT		0
+#define MLXBF_TMFIFO_RX_STS__COUNT_WIDTH		9
+#define MLXBF_TMFIFO_RX_STS__COUNT_RESET_VAL		0
+#define MLXBF_TMFIFO_RX_STS__COUNT_RMASK		GENMASK_ULL(8, 0)
+#define MLXBF_TMFIFO_RX_STS__COUNT_MASK			GENMASK_ULL(8, 0)
+#define MLXBF_TMFIFO_RX_CTL				0x10
+#define MLXBF_TMFIFO_RX_CTL__LENGTH			0x0001
+#define MLXBF_TMFIFO_RX_CTL__LWM_SHIFT			0
+#define MLXBF_TMFIFO_RX_CTL__LWM_WIDTH			8
+#define MLXBF_TMFIFO_RX_CTL__LWM_RESET_VAL		128
+#define MLXBF_TMFIFO_RX_CTL__LWM_RMASK			GENMASK_ULL(7, 0)
+#define MLXBF_TMFIFO_RX_CTL__LWM_MASK			GENMASK_ULL(7, 0)
+#define MLXBF_TMFIFO_RX_CTL__HWM_SHIFT			8
+#define MLXBF_TMFIFO_RX_CTL__HWM_WIDTH			8
+#define MLXBF_TMFIFO_RX_CTL__HWM_RESET_VAL		128
+#define MLXBF_TMFIFO_RX_CTL__HWM_RMASK			GENMASK_ULL(7, 0)
+#define MLXBF_TMFIFO_RX_CTL__HWM_MASK			GENMASK_ULL(15, 8)
+#define MLXBF_TMFIFO_RX_CTL__MAX_ENTRIES_SHIFT		32
+#define MLXBF_TMFIFO_RX_CTL__MAX_ENTRIES_WIDTH		9
+#define MLXBF_TMFIFO_RX_CTL__MAX_ENTRIES_RESET_VAL	256
+#define MLXBF_TMFIFO_RX_CTL__MAX_ENTRIES_RMASK		GENMASK_ULL(8, 0)
+#define MLXBF_TMFIFO_RX_CTL__MAX_ENTRIES_MASK		GENMASK_ULL(40, 32)
+
+#endif /* !defined(__MLXBF_TMFIFO_REGS_H__) */
diff --git a/drivers/platform/mellanox/mlxbf-tmfifo.c b/drivers/platform/mellanox/mlxbf-tmfifo.c
new file mode 100644
index 0000000..9a5c9fd
--- /dev/null
+++ b/drivers/platform/mellanox/mlxbf-tmfifo.c
@@ -0,0 +1,1281 @@
+// SPDX-License-Identifier: GPL-2.0+
+/*
+ * Mellanox BlueField SoC TmFifo driver
+ *
+ * Copyright (C) 2019 Mellanox Technologies
+ */
+
+#include <linux/acpi.h>
+#include <linux/bitfield.h>
+#include <linux/circ_buf.h>
+#include <linux/efi.h>
+#include <linux/irq.h>
+#include <linux/module.h>
+#include <linux/mutex.h>
+#include <linux/platform_device.h>
+#include <linux/types.h>
+
+#include <linux/virtio_config.h>
+#include <linux/virtio_console.h>
+#include <linux/virtio_ids.h>
+#include <linux/virtio_net.h>
+#include <linux/virtio_ring.h>
+
+#include "mlxbf-tmfifo-regs.h"
+
+/* Vring size. */
+#define MLXBF_TMFIFO_VRING_SIZE			SZ_1K
+
+/* Console Tx buffer size. */
+#define MLXBF_TMFIFO_CON_TX_BUF_SIZE		SZ_32K
+
+/* Console Tx buffer reserved space. */
+#define MLXBF_TMFIFO_CON_TX_BUF_RSV_SIZE	8
+
+/* House-keeping timer interval. */
+#define MLXBF_TMFIFO_TIMER_INTERVAL		(HZ / 10)
+
+/* Virtual devices sharing the TM FIFO. */
+#define MLXBF_TMFIFO_VDEV_MAX		(VIRTIO_ID_CONSOLE + 1)
+
+/*
+ * Reserve 1/16 of TmFifo space, so console messages are not starved by
+ * the networking traffic.
+ */
+#define MLXBF_TMFIFO_RESERVE_RATIO		16
+
+/* Message with data needs at least two words (for header & data). */
+#define MLXBF_TMFIFO_DATA_MIN_WORDS		2
+
+struct mlxbf_tmfifo;
+
+/**
+ * mlxbf_tmfifo_vring - Structure of the TmFifo virtual ring
+ * @va: virtual address of the ring
+ * @dma: dma address of the ring
+ * @vq: pointer to the virtio virtqueue
+ * @desc: current descriptor of the pending packet
+ * @desc_head: head descriptor of the pending packet
+ * @cur_len: processed length of the current descriptor
+ * @rem_len: remaining length of the pending packet
+ * @pkt_len: total length of the pending packet
+ * @next_avail: next avail descriptor id
+ * @num: vring size (number of descriptors)
+ * @align: vring alignment size
+ * @index: vring index
+ * @vdev_id: vring virtio id (VIRTIO_ID_xxx)
+ * @fifo: pointer to the tmfifo structure
+ */
+struct mlxbf_tmfifo_vring {
+	void *va;
+	dma_addr_t dma;
+	struct virtqueue *vq;
+	struct vring_desc *desc;
+	struct vring_desc *desc_head;
+	int cur_len;
+	int rem_len;
+	u32 pkt_len;
+	u16 next_avail;
+	int num;
+	int align;
+	int index;
+	int vdev_id;
+	struct mlxbf_tmfifo *fifo;
+};
+
+/* Interrupt types. */
+enum {
+	MLXBF_TM_RX_LWM_IRQ,
+	MLXBF_TM_RX_HWM_IRQ,
+	MLXBF_TM_TX_LWM_IRQ,
+	MLXBF_TM_TX_HWM_IRQ,
+	MLXBF_TM_MAX_IRQ
+};
+
+/* Ring types (Rx & Tx). */
+enum {
+	MLXBF_TMFIFO_VRING_RX,
+	MLXBF_TMFIFO_VRING_TX,
+	MLXBF_TMFIFO_VRING_MAX
+};
+
+/**
+ * mlxbf_tmfifo_vdev - Structure of the TmFifo virtual device
+ * @vdev: virtio device, in which the vdev.id.device field has the
+ *        VIRTIO_ID_xxx id to distinguish the virtual device.
+ * @status: status of the device
+ * @features: supported features of the device
+ * @vrings: array of tmfifo vrings of this device
+ * @config.cons: virtual console config -
+ *               select if vdev.id.device is VIRTIO_ID_CONSOLE
+ * @config.net: virtual network config -
+ *              select if vdev.id.device is VIRTIO_ID_NET
+ * @tx_buf: tx buffer used to buffer data before writing into the FIFO
+ */
+struct mlxbf_tmfifo_vdev {
+	struct virtio_device vdev;
+	u8 status;
+	u64 features;
+	struct mlxbf_tmfifo_vring vrings[MLXBF_TMFIFO_VRING_MAX];
+	union {
+		struct virtio_console_config cons;
+		struct virtio_net_config net;
+	} config;
+	struct circ_buf tx_buf;
+};
+
+/**
+ * mlxbf_tmfifo_irq_info - Structure of the interrupt information
+ * @fifo: pointer to the tmfifo structure
+ * @irq: interrupt number
+ * @index: index into the interrupt array
+ */
+struct mlxbf_tmfifo_irq_info {
+	struct mlxbf_tmfifo *fifo;
+	int irq;
+	int index;
+};
+
+/**
+ * mlxbf_tmfifo - Structure of the TmFifo
+ * @vdev: array of the virtual devices running over the TmFifo
+ * @lock: lock to protect the TmFifo access
+ * @rx_base: mapped register base address for the Rx FIFO
+ * @tx_base: mapped register base address for the Tx FIFO
+ * @rx_fifo_size: number of entries of the Rx FIFO
+ * @tx_fifo_size: number of entries of the Tx FIFO
+ * @pend_events: pending bits for deferred events
+ * @irq_info: interrupt information
+ * @work: work struct for deferred process
+ * @timer: background timer
+ * @vring: Tx/Rx ring
+ * @spin_lock: spin lock
+ * @is_ready: ready flag
+ */
+struct mlxbf_tmfifo {
+	struct mlxbf_tmfifo_vdev *vdev[MLXBF_TMFIFO_VDEV_MAX];
+	struct mutex lock;		/* TmFifo lock */
+	void __iomem *rx_base;
+	void __iomem *tx_base;
+	int rx_fifo_size;
+	int tx_fifo_size;
+	unsigned long pend_events;
+	struct mlxbf_tmfifo_irq_info irq_info[MLXBF_TM_MAX_IRQ];
+	struct work_struct work;
+	struct timer_list timer;
+	struct mlxbf_tmfifo_vring *vring[2];
+	spinlock_t spin_lock;		/* spin lock */
+	bool is_ready;
+};
+
+/**
+ * mlxbf_tmfifo_msg_hdr - Structure of the TmFifo message header
+ * @type: message type
+ * @len: payload length in network byte order. Messages sent into the FIFO
+ *       will be read by the other side as data stream in the same byte order.
+ *       The length needs to be encoded into network order so both sides
+ *       could understand it.
+ */
+struct mlxbf_tmfifo_msg_hdr {
+	u8 type;
+	__be16 len;
+	u8 unused[5];
+} __packed __aligned(sizeof(u64));
+
+/*
+ * Default MAC.
+ * This MAC address will be read from EFI persistent variable if configured.
+ * It can also be reconfigured with standard Linux tools.
+ */
+static u8 mlxbf_tmfifo_net_default_mac[ETH_ALEN] = {
+	0x00, 0x1A, 0xCA, 0xFF, 0xFF, 0x01
+};
+
+/* EFI variable name of the MAC address. */
+static efi_char16_t mlxbf_tmfifo_efi_name[] = L"RshimMacAddr";
+
+/* Maximum L2 header length. */
+#define MLXBF_TMFIFO_NET_L2_OVERHEAD	36
+
+/* Supported virtio-net features. */
+#define MLXBF_TMFIFO_NET_FEATURES \
+	(BIT_ULL(VIRTIO_NET_F_MTU) | BIT_ULL(VIRTIO_NET_F_STATUS) | \
+	 BIT_ULL(VIRTIO_NET_F_MAC))
+
+#define mlxbf_vdev_to_tmfifo(d) container_of(d, struct mlxbf_tmfifo_vdev, vdev)
+
+/* Free vrings of the FIFO device. */
+static void mlxbf_tmfifo_free_vrings(struct mlxbf_tmfifo *fifo,
+				     struct mlxbf_tmfifo_vdev *tm_vdev)
+{
+	struct mlxbf_tmfifo_vring *vring;
+	int i, size;
+
+	for (i = 0; i < ARRAY_SIZE(tm_vdev->vrings); i++) {
+		vring = &tm_vdev->vrings[i];
+		if (vring->va) {
+			size = vring_size(vring->num, vring->align);
+			dma_free_coherent(tm_vdev->vdev.dev.parent, size,
+					  vring->va, vring->dma);
+			vring->va = NULL;
+			if (vring->vq) {
+				vring_del_virtqueue(vring->vq);
+				vring->vq = NULL;
+			}
+		}
+	}
+}
+
+/* Allocate vrings for the FIFO. */
+static int mlxbf_tmfifo_alloc_vrings(struct mlxbf_tmfifo *fifo,
+				     struct mlxbf_tmfifo_vdev *tm_vdev)
+{
+	struct mlxbf_tmfifo_vring *vring;
+	struct device *dev;
+	dma_addr_t dma;
+	int i, size;
+	void *va;
+
+	for (i = 0; i < ARRAY_SIZE(tm_vdev->vrings); i++) {
+		vring = &tm_vdev->vrings[i];
+		vring->fifo = fifo;
+		vring->num = MLXBF_TMFIFO_VRING_SIZE;
+		vring->align = SMP_CACHE_BYTES;
+		vring->index = i;
+		vring->vdev_id = tm_vdev->vdev.id.device;
+		dev = &tm_vdev->vdev.dev;
+
+		size = vring_size(vring->num, vring->align);
+		va = dma_alloc_coherent(dev->parent, size, &dma, GFP_KERNEL);
+		if (!va) {
+			mlxbf_tmfifo_free_vrings(fifo, tm_vdev);
+			dev_err(dev->parent, "dma_alloc_coherent failed\n");
+			return -ENOMEM;
+		}
+
+		vring->va = va;
+		vring->dma = dma;
+	}
+
+	return 0;
+}
+
+/* Disable interrupts of the FIFO device. */
+static void mlxbf_tmfifo_disable_irqs(struct mlxbf_tmfifo *fifo)
+{
+	int i, irq;
+
+	for (i = 0; i < MLXBF_TM_MAX_IRQ; i++) {
+		irq = fifo->irq_info[i].irq;
+		fifo->irq_info[i].irq = 0;
+		disable_irq(irq);
+	}
+}
+
+/* Interrupt handler. */
+static irqreturn_t mlxbf_tmfifo_irq_handler(int irq, void *arg)
+{
+	struct mlxbf_tmfifo_irq_info *irq_info = arg;
+
+	if (!test_and_set_bit(irq_info->index, &irq_info->fifo->pend_events))
+		schedule_work(&irq_info->fifo->work);
+
+	return IRQ_HANDLED;
+}
+
+/* Get the next packet descriptor from the vring. */
+static struct vring_desc *
+mlxbf_tmfifo_get_next_desc(struct mlxbf_tmfifo_vring *vring)
+{
+	const struct vring *vr = virtqueue_get_vring(vring->vq);
+	struct virtio_device *vdev = vring->vq->vdev;
+	unsigned int idx, head;
+
+	if (vring->next_avail == virtio16_to_cpu(vdev, vr->avail->idx))
+		return NULL;
+
+	idx = vring->next_avail % vr->num;
+	head = virtio16_to_cpu(vdev, vr->avail->ring[idx]);
+	if (WARN_ON(head >= vr->num))
+		return NULL;
+
+	vring->next_avail++;
+
+	return &vr->desc[head];
+}
+
+/* Release virtio descriptor. */
+static void mlxbf_tmfifo_release_desc(struct mlxbf_tmfifo_vring *vring,
+				      struct vring_desc *desc, u32 len)
+{
+	const struct vring *vr = virtqueue_get_vring(vring->vq);
+	struct virtio_device *vdev = vring->vq->vdev;
+	u16 idx, vr_idx;
+
+	vr_idx = virtio16_to_cpu(vdev, vr->used->idx);
+	idx = vr_idx % vr->num;
+	vr->used->ring[idx].id = cpu_to_virtio32(vdev, desc - vr->desc);
+	vr->used->ring[idx].len = cpu_to_virtio32(vdev, len);
+
+	/*
+	 * Virtio could poll and check the 'idx' to decide whether the desc is
+	 * done or not. Add a memory barrier here to make sure the update above
+	 * completes before updating the idx.
+	 */
+	mb();
+	vr->used->idx = cpu_to_virtio16(vdev, vr_idx + 1);
+}
+
+/* Get the total length of the descriptor chain. */
+static u32 mlxbf_tmfifo_get_pkt_len(struct mlxbf_tmfifo_vring *vring,
+				    struct vring_desc *desc)
+{
+	const struct vring *vr = virtqueue_get_vring(vring->vq);
+	struct virtio_device *vdev = vring->vq->vdev;
+	u32 len = 0, idx;
+
+	while (desc) {
+		len += virtio32_to_cpu(vdev, desc->len);
+		if (!(virtio16_to_cpu(vdev, desc->flags) & VRING_DESC_F_NEXT))
+			break;
+		idx = virtio16_to_cpu(vdev, desc->next);
+		desc = &vr->desc[idx];
+	}
+
+	return len;
+}
+
+static void mlxbf_tmfifo_release_pending_pkt(struct mlxbf_tmfifo_vring *vring)
+{
+	struct vring_desc *desc_head;
+	u32 len = 0;
+
+	if (vring->desc_head) {
+		desc_head = vring->desc_head;
+		len = vring->pkt_len;
+	} else {
+		desc_head = mlxbf_tmfifo_get_next_desc(vring);
+		len = mlxbf_tmfifo_get_pkt_len(vring, desc_head);
+	}
+
+	if (desc_head)
+		mlxbf_tmfifo_release_desc(vring, desc_head, len);
+
+	vring->pkt_len = 0;
+	vring->desc = NULL;
+	vring->desc_head = NULL;
+}
+
+static void mlxbf_tmfifo_init_net_desc(struct mlxbf_tmfifo_vring *vring,
+				       struct vring_desc *desc, bool is_rx)
+{
+	struct virtio_device *vdev = vring->vq->vdev;
+	struct virtio_net_hdr *net_hdr;
+
+	net_hdr = phys_to_virt(virtio64_to_cpu(vdev, desc->addr));
+	memset(net_hdr, 0, sizeof(*net_hdr));
+}
+
+/* Get and initialize the next packet. */
+static struct vring_desc *
+mlxbf_tmfifo_get_next_pkt(struct mlxbf_tmfifo_vring *vring, bool is_rx)
+{
+	struct vring_desc *desc;
+
+	desc = mlxbf_tmfifo_get_next_desc(vring);
+	if (desc && is_rx && vring->vdev_id == VIRTIO_ID_NET)
+		mlxbf_tmfifo_init_net_desc(vring, desc, is_rx);
+
+	vring->desc_head = desc;
+	vring->desc = desc;
+
+	return desc;
+}
+
+/* House-keeping timer. */
+static void mlxbf_tmfifo_timer(struct timer_list *t)
+{
+	struct mlxbf_tmfifo *fifo = container_of(t, struct mlxbf_tmfifo, timer);
+	int rx, tx;
+
+	rx = !test_and_set_bit(MLXBF_TM_RX_HWM_IRQ, &fifo->pend_events);
+	tx = !test_and_set_bit(MLXBF_TM_TX_LWM_IRQ, &fifo->pend_events);
+
+	if (rx || tx)
+		schedule_work(&fifo->work);
+
+	mod_timer(&fifo->timer, jiffies + MLXBF_TMFIFO_TIMER_INTERVAL);
+}
+
+/* Copy one console packet into the output buffer. */
+static void mlxbf_tmfifo_console_output_one(struct mlxbf_tmfifo_vdev *cons,
+					    struct mlxbf_tmfifo_vring *vring,
+					    struct vring_desc *desc)
+{
+	const struct vring *vr = virtqueue_get_vring(vring->vq);
+	struct virtio_device *vdev = &cons->vdev;
+	u32 len, idx, seg;
+	void *addr;
+
+	while (desc) {
+		addr = phys_to_virt(virtio64_to_cpu(vdev, desc->addr));
+		len = virtio32_to_cpu(vdev, desc->len);
+
+		seg = CIRC_SPACE_TO_END(cons->tx_buf.head, cons->tx_buf.tail,
+					MLXBF_TMFIFO_CON_TX_BUF_SIZE);
+		if (len <= seg) {
+			memcpy(cons->tx_buf.buf + cons->tx_buf.head, addr, len);
+		} else {
+			memcpy(cons->tx_buf.buf + cons->tx_buf.head, addr, seg);
+			addr += seg;
+			memcpy(cons->tx_buf.buf, addr, len - seg);
+		}
+		cons->tx_buf.head = (cons->tx_buf.head + len) %
+			MLXBF_TMFIFO_CON_TX_BUF_SIZE;
+
+		if (!(virtio16_to_cpu(vdev, desc->flags) & VRING_DESC_F_NEXT))
+			break;
+		idx = virtio16_to_cpu(vdev, desc->next);
+		desc = &vr->desc[idx];
+	}
+}
+
+/* Copy console data into the output buffer. */
+static void mlxbf_tmfifo_console_output(struct mlxbf_tmfifo_vdev *cons,
+					struct mlxbf_tmfifo_vring *vring)
+{
+	struct vring_desc *desc;
+	u32 len, avail;
+
+	desc = mlxbf_tmfifo_get_next_desc(vring);
+	while (desc) {
+		/* Release the packet if not enough space. */
+		len = mlxbf_tmfifo_get_pkt_len(vring, desc);
+		avail = CIRC_SPACE(cons->tx_buf.head, cons->tx_buf.tail,
+				   MLXBF_TMFIFO_CON_TX_BUF_SIZE);
+		if (len + MLXBF_TMFIFO_CON_TX_BUF_RSV_SIZE > avail) {
+			mlxbf_tmfifo_release_desc(vring, desc, len);
+			break;
+		}
+
+		mlxbf_tmfifo_console_output_one(cons, vring, desc);
+		mlxbf_tmfifo_release_desc(vring, desc, len);
+		desc = mlxbf_tmfifo_get_next_desc(vring);
+	}
+}
+
+/* Get the number of available words in Rx FIFO for receiving. */
+static int mlxbf_tmfifo_get_rx_avail(struct mlxbf_tmfifo *fifo)
+{
+	u64 sts;
+
+	sts = readq(fifo->rx_base + MLXBF_TMFIFO_RX_STS);
+	return FIELD_GET(MLXBF_TMFIFO_RX_STS__COUNT_MASK, sts);
+}
+
+/* Get the number of available words in the TmFifo for sending. */
+static int mlxbf_tmfifo_get_tx_avail(struct mlxbf_tmfifo *fifo, int vdev_id)
+{
+	int tx_reserve;
+	u32 count;
+	u64 sts;
+
+	/* Reserve some room in FIFO for console messages. */
+	if (vdev_id == VIRTIO_ID_NET)
+		tx_reserve = fifo->tx_fifo_size / MLXBF_TMFIFO_RESERVE_RATIO;
+	else
+		tx_reserve = 1;
+
+	sts = readq(fifo->tx_base + MLXBF_TMFIFO_TX_STS);
+	count = FIELD_GET(MLXBF_TMFIFO_TX_STS__COUNT_MASK, sts);
+	return fifo->tx_fifo_size - tx_reserve - count;
+}
+
+/* Console Tx (move data from the output buffer into the TmFifo). */
+static void mlxbf_tmfifo_console_tx(struct mlxbf_tmfifo *fifo, int avail)
+{
+	struct mlxbf_tmfifo_msg_hdr hdr;
+	struct mlxbf_tmfifo_vdev *cons;
+	unsigned long flags;
+	int size, seg;
+	void *addr;
+	u64 data;
+
+	/* Return if not enough space available. */
+	if (avail < MLXBF_TMFIFO_DATA_MIN_WORDS)
+		return;
+
+	cons = fifo->vdev[VIRTIO_ID_CONSOLE];
+	if (!cons || !cons->tx_buf.buf)
+		return;
+
+	/* Return if no data to send. */
+	size = CIRC_CNT(cons->tx_buf.head, cons->tx_buf.tail,
+			MLXBF_TMFIFO_CON_TX_BUF_SIZE);
+	if (size == 0)
+		return;
+
+	/* Adjust the size to available space. */
+	if (size + sizeof(hdr) > avail * sizeof(u64))
+		size = avail * sizeof(u64) - sizeof(hdr);
+
+	/* Write header. */
+	hdr.type = VIRTIO_ID_CONSOLE;
+	hdr.len = htons(size);
+	writeq(*(u64 *)&hdr, fifo->tx_base + MLXBF_TMFIFO_TX_DATA);
+
+	/* Use spin-lock to protect the 'cons->tx_buf'. */
+	spin_lock_irqsave(&fifo->spin_lock, flags);
+
+	while (size > 0) {
+		addr = cons->tx_buf.buf + cons->tx_buf.tail;
+
+		seg = CIRC_CNT_TO_END(cons->tx_buf.head, cons->tx_buf.tail,
+				      MLXBF_TMFIFO_CON_TX_BUF_SIZE);
+		if (seg >= sizeof(u64)) {
+			memcpy(&data, addr, sizeof(u64));
+		} else {
+			memcpy(&data, addr, seg);
+			memcpy((u8 *)&data + seg, cons->tx_buf.buf,
+			       sizeof(u64) - seg);
+		}
+		writeq(data, fifo->tx_base + MLXBF_TMFIFO_TX_DATA);
+
+		if (size >= sizeof(u64)) {
+			cons->tx_buf.tail = (cons->tx_buf.tail + sizeof(u64)) %
+				MLXBF_TMFIFO_CON_TX_BUF_SIZE;
+			size -= sizeof(u64);
+		} else {
+			cons->tx_buf.tail = (cons->tx_buf.tail + size) %
+				MLXBF_TMFIFO_CON_TX_BUF_SIZE;
+			size = 0;
+		}
+	}
+
+	spin_unlock_irqrestore(&fifo->spin_lock, flags);
+}
+
+/* Rx/Tx one word in the descriptor buffer. */
+static void mlxbf_tmfifo_rxtx_word(struct mlxbf_tmfifo_vring *vring,
+				   struct vring_desc *desc,
+				   bool is_rx, int len)
+{
+	struct virtio_device *vdev = vring->vq->vdev;
+	struct mlxbf_tmfifo *fifo = vring->fifo;
+	void *addr;
+	u64 data;
+
+	/* Get the buffer address of this desc. */
+	addr = phys_to_virt(virtio64_to_cpu(vdev, desc->addr));
+
+	/* Read a word from FIFO for Rx. */
+	if (is_rx)
+		data = readq(fifo->rx_base + MLXBF_TMFIFO_RX_DATA);
+
+	if (vring->cur_len + sizeof(u64) <= len) {
+		/* The whole word. */
+		if (is_rx)
+			memcpy(addr + vring->cur_len, &data, sizeof(u64));
+		else
+			memcpy(&data, addr + vring->cur_len, sizeof(u64));
+		vring->cur_len += sizeof(u64);
+	} else {
+		/* Leftover bytes. */
+		if (is_rx)
+			memcpy(addr + vring->cur_len, &data,
+			       len - vring->cur_len);
+		else
+			memcpy(&data, addr + vring->cur_len,
+			       len - vring->cur_len);
+		vring->cur_len = len;
+	}
+
+	/* Write the word into FIFO for Tx. */
+	if (!is_rx)
+		writeq(data, fifo->tx_base + MLXBF_TMFIFO_TX_DATA);
+}
+
+/*
+ * Rx/Tx packet header.
+ *
+ * In Rx case, the packet might be found to belong to a different vring since
+ * the TmFifo is shared by different services. In such case, the 'vring_change'
+ * flag is set.
+ */
+static void mlxbf_tmfifo_rxtx_header(struct mlxbf_tmfifo_vring *vring,
+				     struct vring_desc *desc,
+				     bool is_rx, bool *vring_change)
+{
+	struct mlxbf_tmfifo *fifo = vring->fifo;
+	struct virtio_net_config *config;
+	struct mlxbf_tmfifo_msg_hdr hdr;
+	int vdev_id, hdr_len;
+
+	/* Read/Write packet header. */
+	if (is_rx) {
+		/* Drain one word from the FIFO. */
+		*(u64 *)&hdr = readq(fifo->rx_base + MLXBF_TMFIFO_RX_DATA);
+
+		/* Skip the length 0 packets (keepalive). */
+		if (hdr.len == 0)
+			return;
+
+		/* Check packet type. */
+		if (hdr.type == VIRTIO_ID_NET) {
+			vdev_id = VIRTIO_ID_NET;
+			hdr_len = sizeof(struct virtio_net_hdr);
+			config = &fifo->vdev[vdev_id]->config.net;
+			if (ntohs(hdr.len) > config->mtu +
+			    MLXBF_TMFIFO_NET_L2_OVERHEAD)
+				return;
+		} else {
+			vdev_id = VIRTIO_ID_CONSOLE;
+			hdr_len = 0;
+		}
+
+		/*
+		 * Check whether the new packet still belongs to this vring.
+		 * If not, update the pkt_len of the new vring.
+		 */
+		if (vdev_id != vring->vdev_id) {
+			struct mlxbf_tmfifo_vdev *tm_dev2 = fifo->vdev[vdev_id];
+
+			if (!tm_dev2)
+				return;
+			vring->desc = desc;
+			vring = &tm_dev2->vrings[MLXBF_TMFIFO_VRING_RX];
+			*vring_change = true;
+		}
+		vring->pkt_len = ntohs(hdr.len) + hdr_len;
+	} else {
+		/* Network virtio has an extra header. */
+		hdr_len = (vring->vdev_id == VIRTIO_ID_NET) ?
+			   sizeof(struct virtio_net_hdr) : 0;
+		vring->pkt_len = mlxbf_tmfifo_get_pkt_len(vring, desc);
+		hdr.type = (vring->vdev_id == VIRTIO_ID_NET) ?
+			    VIRTIO_ID_NET : VIRTIO_ID_CONSOLE;
+		hdr.len = htons(vring->pkt_len - hdr_len);
+		writeq(*(u64 *)&hdr, fifo->tx_base + MLXBF_TMFIFO_TX_DATA);
+	}
+
+	vring->cur_len = hdr_len;
+	vring->rem_len = vring->pkt_len;
+	fifo->vring[is_rx] = vring;
+}
+
+/*
+ * Rx/Tx one descriptor.
+ *
+ * Return true to indicate more data available.
+ */
+static bool mlxbf_tmfifo_rxtx_one_desc(struct mlxbf_tmfifo_vring *vring,
+				       bool is_rx, int *avail)
+{
+	const struct vring *vr = virtqueue_get_vring(vring->vq);
+	struct mlxbf_tmfifo *fifo = vring->fifo;
+	struct virtio_device *vdev;
+	bool vring_change = false;
+	struct vring_desc *desc;
+	unsigned long flags;
+	u32 len, idx;
+
+	vdev = &fifo->vdev[vring->vdev_id]->vdev;
+
+	/* Get the descriptor of the next packet. */
+	if (!vring->desc) {
+		desc = mlxbf_tmfifo_get_next_pkt(vring, is_rx);
+		if (!desc)
+			return false;
+	} else {
+		desc = vring->desc;
+	}
+
+	/* Beginning of a packet. Start to Rx/Tx packet header. */
+	if (vring->pkt_len == 0) {
+		mlxbf_tmfifo_rxtx_header(vring, desc, is_rx, &vring_change);
+		(*avail)--;
+
+		/* Return if new packet is for another ring. */
+		if (vring_change)
+			return false;
+		goto mlxbf_tmfifo_desc_done;
+	}
+
+	/* Get the length of this desc. */
+	len = virtio32_to_cpu(vdev, desc->len);
+	if (len > vring->rem_len)
+		len = vring->rem_len;
+
+	/* Rx/Tx one word (8 bytes) if not done. */
+	if (vring->cur_len < len) {
+		mlxbf_tmfifo_rxtx_word(vring, desc, is_rx, len);
+		(*avail)--;
+	}
+
+	/* Check again whether it's done. */
+	if (vring->cur_len == len) {
+		vring->cur_len = 0;
+		vring->rem_len -= len;
+
+		/* Get the next desc on the chain. */
+		if (vring->rem_len > 0 &&
+		    (virtio16_to_cpu(vdev, desc->flags) & VRING_DESC_F_NEXT)) {
+			idx = virtio16_to_cpu(vdev, desc->next);
+			desc = &vr->desc[idx];
+			goto mlxbf_tmfifo_desc_done;
+		}
+
+		/* Done and release the pending packet. */
+		mlxbf_tmfifo_release_pending_pkt(vring);
+		desc = NULL;
+		fifo->vring[is_rx] = NULL;
+
+		/* Notify upper layer that packet is done. */
+		spin_lock_irqsave(&fifo->spin_lock, flags);
+		vring_interrupt(0, vring->vq);
+		spin_unlock_irqrestore(&fifo->spin_lock, flags);
+	}
+
+mlxbf_tmfifo_desc_done:
+	/* Save the current desc. */
+	vring->desc = desc;
+
+	return true;
+}
+
+/* Rx & Tx processing of a queue. */
+static void mlxbf_tmfifo_rxtx(struct mlxbf_tmfifo_vring *vring, bool is_rx)
+{
+	int avail = 0, devid = vring->vdev_id;
+	struct mlxbf_tmfifo *fifo;
+	bool more;
+
+	fifo = vring->fifo;
+
+	/* Return if vdev is not ready. */
+	if (!fifo->vdev[devid])
+		return;
+
+	/* Return if another vring is running. */
+	if (fifo->vring[is_rx] && fifo->vring[is_rx] != vring)
+		return;
+
+	/* Only handle console and network for now. */
+	if (WARN_ON(devid != VIRTIO_ID_NET && devid != VIRTIO_ID_CONSOLE))
+		return;
+
+	do {
+		/* Get available FIFO space. */
+		if (avail == 0) {
+			if (is_rx)
+				avail = mlxbf_tmfifo_get_rx_avail(fifo);
+			else
+				avail = mlxbf_tmfifo_get_tx_avail(fifo, devid);
+			if (avail <= 0)
+				break;
+		}
+
+		/* Console output always comes from the Tx buffer. */
+		if (!is_rx && devid == VIRTIO_ID_CONSOLE) {
+			mlxbf_tmfifo_console_tx(fifo, avail);
+			break;
+		}
+
+		/* Handle one descriptor. */
+		more = mlxbf_tmfifo_rxtx_one_desc(vring, is_rx, &avail);
+	} while (more);
+}
+
+/* Handle Rx or Tx queues. */
+static void mlxbf_tmfifo_work_rxtx(struct mlxbf_tmfifo *fifo, int queue_id,
+				   int irq_id, bool is_rx)
+{
+	struct mlxbf_tmfifo_vdev *tm_vdev;
+	struct mlxbf_tmfifo_vring *vring;
+	int i;
+
+	if (!test_and_clear_bit(irq_id, &fifo->pend_events) ||
+	    !fifo->irq_info[irq_id].irq)
+		return;
+
+	for (i = 0; i < MLXBF_TMFIFO_VDEV_MAX; i++) {
+		tm_vdev = fifo->vdev[i];
+		if (tm_vdev) {
+			vring = &tm_vdev->vrings[queue_id];
+			if (vring->vq)
+				mlxbf_tmfifo_rxtx(vring, is_rx);
+		}
+	}
+}
+
+/* Work handler for Rx and Tx case. */
+static void mlxbf_tmfifo_work_handler(struct work_struct *work)
+{
+	struct mlxbf_tmfifo *fifo;
+
+	fifo = container_of(work, struct mlxbf_tmfifo, work);
+	if (!fifo->is_ready)
+		return;
+
+	mutex_lock(&fifo->lock);
+
+	/* Tx (Send data to the TmFifo). */
+	mlxbf_tmfifo_work_rxtx(fifo, MLXBF_TMFIFO_VRING_TX,
+			       MLXBF_TM_TX_LWM_IRQ, false);
+
+	/* Rx (Receive data from the TmFifo). */
+	mlxbf_tmfifo_work_rxtx(fifo, MLXBF_TMFIFO_VRING_RX,
+			       MLXBF_TM_RX_HWM_IRQ, true);
+
+	mutex_unlock(&fifo->lock);
+}
+
+/* The notify function is called when new buffers are posted. */
+static bool mlxbf_tmfifo_virtio_notify(struct virtqueue *vq)
+{
+	struct mlxbf_tmfifo_vring *vring = vq->priv;
+	struct mlxbf_tmfifo_vdev *tm_vdev;
+	struct mlxbf_tmfifo *fifo;
+	unsigned long flags;
+
+	fifo = vring->fifo;
+
+	/*
+	 * Virtio maintains vrings in pairs, even number ring for Rx
+	 * and odd number ring for Tx.
+	 */
+	if (vring->index & BIT(0)) {
+		/*
+		 * Console could make blocking call with interrupts disabled.
+		 * In such case, the vring needs to be served right away. For
+		 * other cases, just set the TX LWM bit to start Tx in the
+		 * worker handler.
+		 */
+		if (vring->vdev_id == VIRTIO_ID_CONSOLE) {
+			spin_lock_irqsave(&fifo->spin_lock, flags);
+			tm_vdev = fifo->vdev[VIRTIO_ID_CONSOLE];
+			mlxbf_tmfifo_console_output(tm_vdev, vring);
+			spin_unlock_irqrestore(&fifo->spin_lock, flags);
+		} else if (test_and_set_bit(MLXBF_TM_TX_LWM_IRQ,
+					    &fifo->pend_events)) {
+			return true;
+		}
+	} else {
+		if (test_and_set_bit(MLXBF_TM_RX_HWM_IRQ, &fifo->pend_events))
+			return true;
+	}
+
+	schedule_work(&fifo->work);
+
+	return true;
+}
+
+/* Get the array of feature bits for this device. */
+static u64 mlxbf_tmfifo_virtio_get_features(struct virtio_device *vdev)
+{
+	struct mlxbf_tmfifo_vdev *tm_vdev = mlxbf_vdev_to_tmfifo(vdev);
+
+	return tm_vdev->features;
+}
+
+/* Confirm device features to use. */
+static int mlxbf_tmfifo_virtio_finalize_features(struct virtio_device *vdev)
+{
+	struct mlxbf_tmfifo_vdev *tm_vdev = mlxbf_vdev_to_tmfifo(vdev);
+
+	tm_vdev->features = vdev->features;
+
+	return 0;
+}
+
+/* Free virtqueues found by find_vqs(). */
+static void mlxbf_tmfifo_virtio_del_vqs(struct virtio_device *vdev)
+{
+	struct mlxbf_tmfifo_vdev *tm_vdev = mlxbf_vdev_to_tmfifo(vdev);
+	struct mlxbf_tmfifo_vring *vring;
+	struct virtqueue *vq;
+	int i;
+
+	for (i = 0; i < ARRAY_SIZE(tm_vdev->vrings); i++) {
+		vring = &tm_vdev->vrings[i];
+
+		/* Release the pending packet. */
+		if (vring->desc)
+			mlxbf_tmfifo_release_pending_pkt(vring);
+		vq = vring->vq;
+		if (vq) {
+			vring->vq = NULL;
+			vring_del_virtqueue(vq);
+		}
+	}
+}
+
+/* Create and initialize the virtual queues. */
+static int mlxbf_tmfifo_virtio_find_vqs(struct virtio_device *vdev,
+					unsigned int nvqs,
+					struct virtqueue *vqs[],
+					vq_callback_t *callbacks[],
+					const char * const names[],
+					const bool *ctx,
+					struct irq_affinity *desc)
+{
+	struct mlxbf_tmfifo_vdev *tm_vdev = mlxbf_vdev_to_tmfifo(vdev);
+	struct mlxbf_tmfifo_vring *vring;
+	struct virtqueue *vq;
+	int i, ret, size;
+
+	if (nvqs > ARRAY_SIZE(tm_vdev->vrings))
+		return -EINVAL;
+
+	for (i = 0; i < nvqs; ++i) {
+		if (!names[i]) {
+			ret = -EINVAL;
+			goto error;
+		}
+		vring = &tm_vdev->vrings[i];
+
+		/* zero vring */
+		size = vring_size(vring->num, vring->align);
+		memset(vring->va, 0, size);
+		vq = vring_new_virtqueue(i, vring->num, vring->align, vdev,
+					 false, false, vring->va,
+					 mlxbf_tmfifo_virtio_notify,
+					 callbacks[i], names[i]);
+		if (!vq) {
+			dev_err(&vdev->dev, "vring_new_virtqueue failed\n");
+			ret = -ENOMEM;
+			goto error;
+		}
+
+		vqs[i] = vq;
+		vring->vq = vq;
+		vq->priv = vring;
+	}
+
+	return 0;
+
+error:
+	mlxbf_tmfifo_virtio_del_vqs(vdev);
+	return ret;
+}
+
+/* Read the status byte. */
+static u8 mlxbf_tmfifo_virtio_get_status(struct virtio_device *vdev)
+{
+	struct mlxbf_tmfifo_vdev *tm_vdev = mlxbf_vdev_to_tmfifo(vdev);
+
+	return tm_vdev->status;
+}
+
+/* Write the status byte. */
+static void mlxbf_tmfifo_virtio_set_status(struct virtio_device *vdev,
+					   u8 status)
+{
+	struct mlxbf_tmfifo_vdev *tm_vdev = mlxbf_vdev_to_tmfifo(vdev);
+
+	tm_vdev->status = status;
+}
+
+/* Reset the device. Not much here for now. */
+static void mlxbf_tmfifo_virtio_reset(struct virtio_device *vdev)
+{
+	struct mlxbf_tmfifo_vdev *tm_vdev = mlxbf_vdev_to_tmfifo(vdev);
+
+	tm_vdev->status = 0;
+}
+
+/* Read the value of a configuration field. */
+static void mlxbf_tmfifo_virtio_get(struct virtio_device *vdev,
+				    unsigned int offset,
+				    void *buf,
+				    unsigned int len)
+{
+	struct mlxbf_tmfifo_vdev *tm_vdev = mlxbf_vdev_to_tmfifo(vdev);
+
+	if ((u64)offset + len > sizeof(tm_vdev->config))
+		return;
+
+	memcpy(buf, (u8 *)&tm_vdev->config + offset, len);
+}
+
+/* Write the value of a configuration field. */
+static void mlxbf_tmfifo_virtio_set(struct virtio_device *vdev,
+				    unsigned int offset,
+				    const void *buf,
+				    unsigned int len)
+{
+	struct mlxbf_tmfifo_vdev *tm_vdev = mlxbf_vdev_to_tmfifo(vdev);
+
+	if ((u64)offset + len > sizeof(tm_vdev->config))
+		return;
+
+	memcpy((u8 *)&tm_vdev->config + offset, buf, len);
+}
+
+static void tmfifo_virtio_dev_release(struct device *device)
+{
+	struct virtio_device *vdev =
+			container_of(device, struct virtio_device, dev);
+	struct mlxbf_tmfifo_vdev *tm_vdev = mlxbf_vdev_to_tmfifo(vdev);
+
+	kfree(tm_vdev);
+}
+
+/* Virtio config operations. */
+static const struct virtio_config_ops mlxbf_tmfifo_virtio_config_ops = {
+	.get_features = mlxbf_tmfifo_virtio_get_features,
+	.finalize_features = mlxbf_tmfifo_virtio_finalize_features,
+	.find_vqs = mlxbf_tmfifo_virtio_find_vqs,
+	.del_vqs = mlxbf_tmfifo_virtio_del_vqs,
+	.reset = mlxbf_tmfifo_virtio_reset,
+	.set_status = mlxbf_tmfifo_virtio_set_status,
+	.get_status = mlxbf_tmfifo_virtio_get_status,
+	.get = mlxbf_tmfifo_virtio_get,
+	.set = mlxbf_tmfifo_virtio_set,
+};
+
+/* Create vdev for the FIFO. */
+static int mlxbf_tmfifo_create_vdev(struct device *dev,
+				    struct mlxbf_tmfifo *fifo,
+				    int vdev_id, u64 features,
+				    void *config, u32 size)
+{
+	struct mlxbf_tmfifo_vdev *tm_vdev, *reg_dev = NULL;
+	int ret;
+
+	mutex_lock(&fifo->lock);
+
+	tm_vdev = fifo->vdev[vdev_id];
+	if (tm_vdev) {
+		dev_err(dev, "vdev %d already exists\n", vdev_id);
+		ret = -EEXIST;
+		goto fail;
+	}
+
+	tm_vdev = kzalloc(sizeof(*tm_vdev), GFP_KERNEL);
+	if (!tm_vdev) {
+		ret = -ENOMEM;
+		goto fail;
+	}
+
+	tm_vdev->vdev.id.device = vdev_id;
+	tm_vdev->vdev.config = &mlxbf_tmfifo_virtio_config_ops;
+	tm_vdev->vdev.dev.parent = dev;
+	tm_vdev->vdev.dev.release = tmfifo_virtio_dev_release;
+	tm_vdev->features = features;
+	if (config)
+		memcpy(&tm_vdev->config, config, size);
+
+	if (mlxbf_tmfifo_alloc_vrings(fifo, tm_vdev)) {
+		dev_err(dev, "unable to allocate vring\n");
+		ret = -ENOMEM;
+		goto vdev_fail;
+	}
+
+	/* Allocate an output buffer for the console device. */
+	if (vdev_id == VIRTIO_ID_CONSOLE)
+		tm_vdev->tx_buf.buf = devm_kmalloc(dev,
+						   MLXBF_TMFIFO_CON_TX_BUF_SIZE,
+						   GFP_KERNEL);
+	fifo->vdev[vdev_id] = tm_vdev;
+
+	/* Register the virtio device. */
+	ret = register_virtio_device(&tm_vdev->vdev);
+	reg_dev = tm_vdev;
+	if (ret) {
+		dev_err(dev, "register_virtio_device failed\n");
+		goto vdev_fail;
+	}
+
+	mutex_unlock(&fifo->lock);
+	return 0;
+
+vdev_fail:
+	mlxbf_tmfifo_free_vrings(fifo, tm_vdev);
+	fifo->vdev[vdev_id] = NULL;
+	if (reg_dev)
+		put_device(&tm_vdev->vdev.dev);
+	else
+		kfree(tm_vdev);
+fail:
+	mutex_unlock(&fifo->lock);
+	return ret;
+}
+
+/* Delete vdev for the FIFO. */
+static int mlxbf_tmfifo_delete_vdev(struct mlxbf_tmfifo *fifo, int vdev_id)
+{
+	struct mlxbf_tmfifo_vdev *tm_vdev;
+
+	mutex_lock(&fifo->lock);
+
+	/* Unregister vdev. */
+	tm_vdev = fifo->vdev[vdev_id];
+	if (tm_vdev) {
+		unregister_virtio_device(&tm_vdev->vdev);
+		mlxbf_tmfifo_free_vrings(fifo, tm_vdev);
+		fifo->vdev[vdev_id] = NULL;
+	}
+
+	mutex_unlock(&fifo->lock);
+
+	return 0;
+}
+
+/* Read the configured network MAC address from efi variable. */
+static void mlxbf_tmfifo_get_cfg_mac(u8 *mac)
+{
+	efi_guid_t guid = EFI_GLOBAL_VARIABLE_GUID;
+	unsigned long size = ETH_ALEN;
+	u8 buf[ETH_ALEN];
+	efi_status_t rc;
+
+	rc = efi.get_variable(mlxbf_tmfifo_efi_name, &guid, NULL, &size, buf);
+	if (rc == EFI_SUCCESS && size == ETH_ALEN)
+		ether_addr_copy(mac, buf);
+	else
+		ether_addr_copy(mac, mlxbf_tmfifo_net_default_mac);
+}
+
+/* Set TmFifo thresolds which is used to trigger interrupts. */
+static void mlxbf_tmfifo_set_threshold(struct mlxbf_tmfifo *fifo)
+{
+	u64 ctl;
+
+	/* Get Tx FIFO size and set the low/high watermark. */
+	ctl = readq(fifo->tx_base + MLXBF_TMFIFO_TX_CTL);
+	fifo->tx_fifo_size =
+		FIELD_GET(MLXBF_TMFIFO_TX_CTL__MAX_ENTRIES_MASK, ctl);
+	ctl = (ctl & ~MLXBF_TMFIFO_TX_CTL__LWM_MASK) |
+		FIELD_PREP(MLXBF_TMFIFO_TX_CTL__LWM_MASK,
+			   fifo->tx_fifo_size / 2);
+	ctl = (ctl & ~MLXBF_TMFIFO_TX_CTL__HWM_MASK) |
+		FIELD_PREP(MLXBF_TMFIFO_TX_CTL__HWM_MASK,
+			   fifo->tx_fifo_size - 1);
+	writeq(ctl, fifo->tx_base + MLXBF_TMFIFO_TX_CTL);
+
+	/* Get Rx FIFO size and set the low/high watermark. */
+	ctl = readq(fifo->rx_base + MLXBF_TMFIFO_RX_CTL);
+	fifo->rx_fifo_size =
+		FIELD_GET(MLXBF_TMFIFO_RX_CTL__MAX_ENTRIES_MASK, ctl);
+	ctl = (ctl & ~MLXBF_TMFIFO_RX_CTL__LWM_MASK) |
+		FIELD_PREP(MLXBF_TMFIFO_RX_CTL__LWM_MASK, 0);
+	ctl = (ctl & ~MLXBF_TMFIFO_RX_CTL__HWM_MASK) |
+		FIELD_PREP(MLXBF_TMFIFO_RX_CTL__HWM_MASK, 1);
+	writeq(ctl, fifo->rx_base + MLXBF_TMFIFO_RX_CTL);
+}
+
+static void mlxbf_tmfifo_cleanup(struct mlxbf_tmfifo *fifo)
+{
+	int i;
+
+	fifo->is_ready = false;
+	del_timer_sync(&fifo->timer);
+	mlxbf_tmfifo_disable_irqs(fifo);
+	cancel_work_sync(&fifo->work);
+	for (i = 0; i < MLXBF_TMFIFO_VDEV_MAX; i++)
+		mlxbf_tmfifo_delete_vdev(fifo, i);
+}
+
+/* Probe the TMFIFO. */
+static int mlxbf_tmfifo_probe(struct platform_device *pdev)
+{
+	struct virtio_net_config net_config;
+	struct device *dev = &pdev->dev;
+	struct mlxbf_tmfifo *fifo;
+	int i, rc;
+
+	fifo = devm_kzalloc(dev, sizeof(*fifo), GFP_KERNEL);
+	if (!fifo)
+		return -ENOMEM;
+
+	spin_lock_init(&fifo->spin_lock);
+	INIT_WORK(&fifo->work, mlxbf_tmfifo_work_handler);
+	mutex_init(&fifo->lock);
+
+	/* Get the resource of the Rx FIFO. */
+	fifo->rx_base = devm_platform_ioremap_resource(pdev, 0);
+	if (IS_ERR(fifo->rx_base))
+		return PTR_ERR(fifo->rx_base);
+
+	/* Get the resource of the Tx FIFO. */
+	fifo->tx_base = devm_platform_ioremap_resource(pdev, 1);
+	if (IS_ERR(fifo->tx_base))
+		return PTR_ERR(fifo->tx_base);
+
+	platform_set_drvdata(pdev, fifo);
+
+	timer_setup(&fifo->timer, mlxbf_tmfifo_timer, 0);
+
+	for (i = 0; i < MLXBF_TM_MAX_IRQ; i++) {
+		fifo->irq_info[i].index = i;
+		fifo->irq_info[i].fifo = fifo;
+		fifo->irq_info[i].irq = platform_get_irq(pdev, i);
+		rc = devm_request_irq(dev, fifo->irq_info[i].irq,
+				      mlxbf_tmfifo_irq_handler, 0,
+				      "tmfifo", &fifo->irq_info[i]);
+		if (rc) {
+			dev_err(dev, "devm_request_irq failed\n");
+			fifo->irq_info[i].irq = 0;
+			return rc;
+		}
+	}
+
+	mlxbf_tmfifo_set_threshold(fifo);
+
+	/* Create the console vdev. */
+	rc = mlxbf_tmfifo_create_vdev(dev, fifo, VIRTIO_ID_CONSOLE, 0, NULL, 0);
+	if (rc)
+		goto fail;
+
+	/* Create the network vdev. */
+	memset(&net_config, 0, sizeof(net_config));
+	net_config.mtu = ETH_DATA_LEN;
+	net_config.status = VIRTIO_NET_S_LINK_UP;
+	mlxbf_tmfifo_get_cfg_mac(net_config.mac);
+	rc = mlxbf_tmfifo_create_vdev(dev, fifo, VIRTIO_ID_NET,
+				      MLXBF_TMFIFO_NET_FEATURES, &net_config,
+				      sizeof(net_config));
+	if (rc)
+		goto fail;
+
+	mod_timer(&fifo->timer, jiffies + MLXBF_TMFIFO_TIMER_INTERVAL);
+
+	fifo->is_ready = true;
+	return 0;
+
+fail:
+	mlxbf_tmfifo_cleanup(fifo);
+	return rc;
+}
+
+/* Device remove function. */
+static int mlxbf_tmfifo_remove(struct platform_device *pdev)
+{
+	struct mlxbf_tmfifo *fifo = platform_get_drvdata(pdev);
+
+	mlxbf_tmfifo_cleanup(fifo);
+
+	return 0;
+}
+
+static const struct acpi_device_id mlxbf_tmfifo_acpi_match[] = {
+	{ "MLNXBF01", 0 },
+	{}
+};
+MODULE_DEVICE_TABLE(acpi, mlxbf_tmfifo_acpi_match);
+
+static struct platform_driver mlxbf_tmfifo_driver = {
+	.probe = mlxbf_tmfifo_probe,
+	.remove = mlxbf_tmfifo_remove,
+	.driver = {
+		.name = "bf-tmfifo",
+		.acpi_match_table = mlxbf_tmfifo_acpi_match,
+	},
+};
+
+module_platform_driver(mlxbf_tmfifo_driver);
+
+MODULE_DESCRIPTION("Mellanox BlueField SoC TmFifo Driver");
+MODULE_LICENSE("GPL v2");
+MODULE_AUTHOR("Mellanox Technologies");
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v16] platform/mellanox: Add TmFifo driver for Mellanox BlueField Soc
       [not found] <b143b40446c1870fb8d422b364ead95d54552be9.1527264077.git.lsun@mellanox.com>
                   ` (10 preceding siblings ...)
  2019-04-12 17:30 ` [PATCH v15] " Liming Sun
@ 2019-05-03 13:49 ` Liming Sun
  2019-05-06  9:13   ` Andy Shevchenko
  11 siblings, 1 reply; 30+ messages in thread
From: Liming Sun @ 2019-05-03 13:49 UTC (permalink / raw)
  To: David Woods, Andy Shevchenko, Darren Hart, Vadim Pasternak
  Cc: Liming Sun, linux-kernel, platform-driver-x86

This commit adds the TmFifo platform driver for Mellanox BlueField
Soc. TmFifo is a shared FIFO which enables external host machine
to exchange data with the SoC via USB or PCIe. The driver is based
on virtio framework and has console and network access enabled.

Reviewed-by: Vadim Pasternak <vadimp@mellanox.com>
Signed-off-by: Liming Sun <lsun@mellanox.com>
---
v15->v16:
    Rebase and resubmit (no new changes).
v14->v15:
    Fixes for comments from Andy:
    - Remove the 'union' definition of mlxbf_tmfifo_msg_hdr and use
      on-the-fly conversion when sending the 8-byte message header
      into the FIFO;
    - Update comment of mlxbf_tmfifo_msg_hdr explaining why '__be16'
      is needed for the 'len' field. The SoC sends data stream into
      the FIFO and the other side reads it. The byte order of the data
      stream (byte-stream) stays the same. The 'len' field is encoded
      into network byte order so upper-level applications in external
      host machine with different endianness could decode it. This
      implementation was verified over USB with an external PPC host
      machine running in big-endian mode.
    - Move the 'dev_err()' line to the end of the block in function
      mlxbf_tmfifo_alloc_vrings();
    - Remove the 'irq_info->index < MLXBF_TM_MAX_IRQ' check in
      mlxbf_tmfifo_irq_handler() since it's unnecessary;
    - Remove the 'if (desc_head)' check in
      mlxbf_tmfifo_release_pending_pkt() since function
      mlxbf_tmfifo_get_pkt_len() is already NULL-aware;
    - Adjust the testing order of 'if (!(vring->index & BIT(0)))'
      in bool mlxbf_tmfifo_virtio_notify() to test the positive case
      'if (vring->index & BIT(0))' first;
    - Add '(u64)offset' conversion in mlxbf_tmfifo_virtio_get() to
      avoid 32-bit length addition overflow;
    - Update the 'efi.get_variable' statement into single line in
      mlxbf_tmfifo_get_cfg_mac();
    - Use new helper devm_platform_ioremap_resource() to replace
      'platform_get_resource() + devm_ioremap_resource()' in
      mlxbf_tmfifo_probe();
v13->v14:
    Fixes for comments from Andy:
    - Add a blank line to separate the virtio header files;
    - Update the comment for 'union mlxbf_tmfifo_msg_hdr' to be
      more clear how this union is used;
    - Update the 'mlxbf_tmfifo_net_default_mac[ETH_ALEN]' definition
      to be two lines;
    - Reformat macro MLXBF_TMFIFO_NET_FEATURES to put the definition
      in a seperate line;
    - Update all 'fifo' to 'FIFO' in the comments;
    - Update mlxbf_tmfifo_alloc_vrings() to specifically release the
      allocated entries in case of failures, so the logic looks more
      clear. In the caller function the mlxbf_tmfifo_free_vrings()
      might be called again in case of other failures, which is ok
      since the 'va' pointer will be set to NULL once released;
    - Update mlxbf_tmfifo_timer() to change the first statement to
      one line;
    - Update one memcpy() to ether_addr_copy() in
      mlxbf_tmfifo_get_cfg_mac();
    - Remove 'fifo->pdev' since it is really not needed;
    - Define temporary variable to update the mlxbf_tmfifo_create_vdev()
      statement into single line.
    New changes by Liming:
    - Reorder the logic a little bit in mlxbf_tmfifo_timer(). Previously
      it has logic like "!a || !b" while the '!b' will not be evaluated
      if '!a' is true. It was changed to this way during review, but is
      actually not the desired behavior since both bits need to be
      tested/set in fifo->pend_events. This issue was found during
      verification which caused extra delays for Tx packets.
v12->v13:
    Rebase and resubmit (no new changes).
v11->v12:
    Fixed the two unsolved comments from v11.
    - "Change macro mlxbf_vdev_to_tmfifo() to one line"
      Done. Seems not hard.
    - "Is it appropriate use of devm_* for 'tm_vdev = devm_kzalloc'"
      Yes, understand the comment now. The tmfifo is fixed, but the
      vdev is dynamic. Use kzalloc() instead, and free the device
      in the release callback which is the right place for it.
v10->v11:
    Fixes for comments from Andy:
    - Use GENMASK_ULL() instead of GENMASK() in mlxbf-tmfifo-regs.h
    - Removed the cpu_to_le64()/le64_to_cpu() conversion since
      readq()/writeq() already takes care of it.
    - Remove the "if (irq)" check in mlxbf_tmfifo_disable_irqs().
    - Add "u32 count" temp variable in mlxbf_tmfifo_get_tx_avail().
    - Clean up mlxbf_tmfifo_get_cfg_mac(), use ETH_ALEN instead of
      value 6.
    - Change the tx_buf to use Linux existing 'struct circ_buf'.
    Comment not applied:
    - "Change macro mlxbf_vdev_to_tmfifo() to one line"
      Couldn't fit in one line with 80 chracters
    - "Is it appropriate use of devm_* for 'tm_vdev = devm_kzalloc'"
      This is SoC, the device won't be closed or detached.
      The only case is when the driver is unloaded. So it appears
      ok to use devm_kzalloc() since it's allocated during probe()
      and released during module unload.
    Comments from Vadim: OK
v9->v10:
    Fixes for comments from Andy:
    - Use devm_ioremap_resource() instead of devm_ioremap().
    - Use kernel-doc comments.
    - Keep Makefile contents sorted.
    - Use same fixed format for offsets.
    - Use SZ_1K/SZ_32K instead of 1024/23*1024.
    - Remove unnecessary comments.
    - Use one style for max numbers.
    - More comments for mlxbf_tmfifo_vdev and mlxbf_tmfifo_data_64bit.
    - Use globally defined MTU instead of new definition.
    - Remove forward declaration of mlxbf_tmfifo_remove().
    - Remove PAGE_ALIGN() for dma_alloc_coherent)().
    - Remove the cast of "struct vring *".
    - Check return result of test_and_set_bit().
    - Add a macro mlxbt_vdev_to_tmfifo().
    - Several other minor coding style comments.
    Comment not applied:
    - "Shouldn't be rather helper in EFI lib in kernel"
      Looks like efi.get_variable() is the way I found in the kernel
      tree.
    - "this one is not protected anyhow? Potential race condition"
      In mlxbf_tmfifo_console_tx(), the spin-lock is used to protect the
      'tx_buf' only, not the FIFO writes. So there is no race condition.
    - "Is __packed needed in mlxbf_tmfifo_msg_hdr".
      Yes, it is needed to make sure the structure is 8 bytes.
    Fixes for comments from Vadim:
    - Use tab in mlxbf-tmfifo-regs.h
    - Use kernel-doc comments for struct mlxbf_tmfifo_msg_hdr and
      mlxbf_tmfifo_irq_info as well.
    - Use _MAX instead of _CNT in the macro definition to be consistent.
    - Fix the MODULE_LICENSE.
    - Use BIT_ULL() instead of BIT().
    - Remove argument of 'avail' for mlxbf_tmfifo_rxtx_header() and
      mlxbf_tmfifo_rxtx_word()
    - Revise logic in mlxbf_tmfifo_rxtx_one_desc() to remove the
      WARN_ON().
    - Change "union mlxbf_tmfifo_u64 u" to "union mlxbf_tmfifo_u64 buf"
      in mlxbf_tmfifo_rxtx_word().
    - Change date type of vring_change from 'int' to 'bool'.
    - Remove the blank lines after Signed-off.
    - Don’t use declaration in the middle.
    - Make the network header initialization in some more elegant way.
    - Change label done to mlxbf_tmfifo_desc_done.
    - Remove some unnecessary comments, and several other misc coding
      style comments.
    - Simplify code logic in mlxbf_tmfifo_virtio_notify()
    New changes by Liming:
    - Simplify the Rx/Tx function arguments to make it more readable.
v8->v9:
    Fixes for comments from Andy:
    - Use modern devm_xxx() API instead.
    Fixes for comments from Vadim:
    - Split the Rx/Tx function into smaller funcitons.
    - File name, copyright information.
    - Function and variable name conversion.
    - Local variable and indent coding styles.
    - Remove unnecessary 'inline' declarations.
    - Use devm_xxx() APIs.
    - Move the efi_char16_t MAC address definition to global.
    - Fix warnings reported by 'checkpatch --strict'.
    - Fix warnings reported by 'make CF="-D__CHECK_ENDIAN__"'.
    - Change select VIRTIO_xxx to depends on  VIRTIO_ in Kconfig.
    - Merge mlxbf_tmfifo_vdev_tx_buf_push() and
      mlxbf_tmfifo_vdev_tx_buf_pop().
    - Add union to avoid casting between __le64 and u64.
    - Several other misc coding style comments.
    New changes by Liming:
    - Removed the DT binding documentation since only ACPI is
      supported for now by UEFI on the SoC.
v8: Re-submit under drivers/platform/mellanox for the target-side
    platform driver only.
v7: Added host side drivers into the same patch set.
v5~v6: Coding style fix.
v1~v4: Initial version for directory drivers/soc/mellanox.
---
 drivers/platform/mellanox/Kconfig             |   12 +-
 drivers/platform/mellanox/Makefile            |    1 +
 drivers/platform/mellanox/mlxbf-tmfifo-regs.h |   63 ++
 drivers/platform/mellanox/mlxbf-tmfifo.c      | 1281 +++++++++++++++++++++++++
 4 files changed, 1356 insertions(+), 1 deletion(-)
 create mode 100644 drivers/platform/mellanox/mlxbf-tmfifo-regs.h
 create mode 100644 drivers/platform/mellanox/mlxbf-tmfifo.c

diff --git a/drivers/platform/mellanox/Kconfig b/drivers/platform/mellanox/Kconfig
index cd8a908..530fe7e 100644
--- a/drivers/platform/mellanox/Kconfig
+++ b/drivers/platform/mellanox/Kconfig
@@ -5,7 +5,7 @@
 
 menuconfig MELLANOX_PLATFORM
 	bool "Platform support for Mellanox hardware"
-	depends on X86 || ARM || COMPILE_TEST
+	depends on X86 || ARM || ARM64 || COMPILE_TEST
 	---help---
 	  Say Y here to get to see options for platform support for
 	  Mellanox systems. This option alone does not add any kernel code.
@@ -34,4 +34,14 @@ config MLXREG_IO
 	  to system resets operation, system reset causes monitoring and some
 	  kinds of mux selection.
 
+config MLXBF_TMFIFO
+	tristate "Mellanox BlueField SoC TmFifo platform driver"
+	depends on ARM64
+	depends on ACPI
+	depends on VIRTIO_CONSOLE && VIRTIO_NET
+	help
+	  Say y here to enable TmFifo support. The TmFifo driver provides
+          platform driver support for the TmFifo which supports console
+          and networking based on the virtio framework.
+
 endif # MELLANOX_PLATFORM
diff --git a/drivers/platform/mellanox/Makefile b/drivers/platform/mellanox/Makefile
index 57074d9c..a229bda1 100644
--- a/drivers/platform/mellanox/Makefile
+++ b/drivers/platform/mellanox/Makefile
@@ -3,5 +3,6 @@
 # Makefile for linux/drivers/platform/mellanox
 # Mellanox Platform-Specific Drivers
 #
+obj-$(CONFIG_MLXBF_TMFIFO)	+= mlxbf-tmfifo.o
 obj-$(CONFIG_MLXREG_HOTPLUG)	+= mlxreg-hotplug.o
 obj-$(CONFIG_MLXREG_IO) += mlxreg-io.o
diff --git a/drivers/platform/mellanox/mlxbf-tmfifo-regs.h b/drivers/platform/mellanox/mlxbf-tmfifo-regs.h
new file mode 100644
index 0000000..e4f0d2e
--- /dev/null
+++ b/drivers/platform/mellanox/mlxbf-tmfifo-regs.h
@@ -0,0 +1,63 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Copyright (c) 2019, Mellanox Technologies. All rights reserved.
+ */
+
+#ifndef __MLXBF_TMFIFO_REGS_H__
+#define __MLXBF_TMFIFO_REGS_H__
+
+#include <linux/types.h>
+#include <linux/bits.h>
+
+#define MLXBF_TMFIFO_TX_DATA				0x00
+#define MLXBF_TMFIFO_TX_STS				0x08
+#define MLXBF_TMFIFO_TX_STS__LENGTH			0x0001
+#define MLXBF_TMFIFO_TX_STS__COUNT_SHIFT		0
+#define MLXBF_TMFIFO_TX_STS__COUNT_WIDTH		9
+#define MLXBF_TMFIFO_TX_STS__COUNT_RESET_VAL		0
+#define MLXBF_TMFIFO_TX_STS__COUNT_RMASK		GENMASK_ULL(8, 0)
+#define MLXBF_TMFIFO_TX_STS__COUNT_MASK			GENMASK_ULL(8, 0)
+#define MLXBF_TMFIFO_TX_CTL				0x10
+#define MLXBF_TMFIFO_TX_CTL__LENGTH			0x0001
+#define MLXBF_TMFIFO_TX_CTL__LWM_SHIFT			0
+#define MLXBF_TMFIFO_TX_CTL__LWM_WIDTH			8
+#define MLXBF_TMFIFO_TX_CTL__LWM_RESET_VAL		128
+#define MLXBF_TMFIFO_TX_CTL__LWM_RMASK			GENMASK_ULL(7, 0)
+#define MLXBF_TMFIFO_TX_CTL__LWM_MASK			GENMASK_ULL(7, 0)
+#define MLXBF_TMFIFO_TX_CTL__HWM_SHIFT			8
+#define MLXBF_TMFIFO_TX_CTL__HWM_WIDTH			8
+#define MLXBF_TMFIFO_TX_CTL__HWM_RESET_VAL		128
+#define MLXBF_TMFIFO_TX_CTL__HWM_RMASK			GENMASK_ULL(7, 0)
+#define MLXBF_TMFIFO_TX_CTL__HWM_MASK			GENMASK_ULL(15, 8)
+#define MLXBF_TMFIFO_TX_CTL__MAX_ENTRIES_SHIFT		32
+#define MLXBF_TMFIFO_TX_CTL__MAX_ENTRIES_WIDTH		9
+#define MLXBF_TMFIFO_TX_CTL__MAX_ENTRIES_RESET_VAL	256
+#define MLXBF_TMFIFO_TX_CTL__MAX_ENTRIES_RMASK		GENMASK_ULL(8, 0)
+#define MLXBF_TMFIFO_TX_CTL__MAX_ENTRIES_MASK		GENMASK_ULL(40, 32)
+#define MLXBF_TMFIFO_RX_DATA				0x00
+#define MLXBF_TMFIFO_RX_STS				0x08
+#define MLXBF_TMFIFO_RX_STS__LENGTH			0x0001
+#define MLXBF_TMFIFO_RX_STS__COUNT_SHIFT		0
+#define MLXBF_TMFIFO_RX_STS__COUNT_WIDTH		9
+#define MLXBF_TMFIFO_RX_STS__COUNT_RESET_VAL		0
+#define MLXBF_TMFIFO_RX_STS__COUNT_RMASK		GENMASK_ULL(8, 0)
+#define MLXBF_TMFIFO_RX_STS__COUNT_MASK			GENMASK_ULL(8, 0)
+#define MLXBF_TMFIFO_RX_CTL				0x10
+#define MLXBF_TMFIFO_RX_CTL__LENGTH			0x0001
+#define MLXBF_TMFIFO_RX_CTL__LWM_SHIFT			0
+#define MLXBF_TMFIFO_RX_CTL__LWM_WIDTH			8
+#define MLXBF_TMFIFO_RX_CTL__LWM_RESET_VAL		128
+#define MLXBF_TMFIFO_RX_CTL__LWM_RMASK			GENMASK_ULL(7, 0)
+#define MLXBF_TMFIFO_RX_CTL__LWM_MASK			GENMASK_ULL(7, 0)
+#define MLXBF_TMFIFO_RX_CTL__HWM_SHIFT			8
+#define MLXBF_TMFIFO_RX_CTL__HWM_WIDTH			8
+#define MLXBF_TMFIFO_RX_CTL__HWM_RESET_VAL		128
+#define MLXBF_TMFIFO_RX_CTL__HWM_RMASK			GENMASK_ULL(7, 0)
+#define MLXBF_TMFIFO_RX_CTL__HWM_MASK			GENMASK_ULL(15, 8)
+#define MLXBF_TMFIFO_RX_CTL__MAX_ENTRIES_SHIFT		32
+#define MLXBF_TMFIFO_RX_CTL__MAX_ENTRIES_WIDTH		9
+#define MLXBF_TMFIFO_RX_CTL__MAX_ENTRIES_RESET_VAL	256
+#define MLXBF_TMFIFO_RX_CTL__MAX_ENTRIES_RMASK		GENMASK_ULL(8, 0)
+#define MLXBF_TMFIFO_RX_CTL__MAX_ENTRIES_MASK		GENMASK_ULL(40, 32)
+
+#endif /* !defined(__MLXBF_TMFIFO_REGS_H__) */
diff --git a/drivers/platform/mellanox/mlxbf-tmfifo.c b/drivers/platform/mellanox/mlxbf-tmfifo.c
new file mode 100644
index 0000000..9a5c9fd
--- /dev/null
+++ b/drivers/platform/mellanox/mlxbf-tmfifo.c
@@ -0,0 +1,1281 @@
+// SPDX-License-Identifier: GPL-2.0+
+/*
+ * Mellanox BlueField SoC TmFifo driver
+ *
+ * Copyright (C) 2019 Mellanox Technologies
+ */
+
+#include <linux/acpi.h>
+#include <linux/bitfield.h>
+#include <linux/circ_buf.h>
+#include <linux/efi.h>
+#include <linux/irq.h>
+#include <linux/module.h>
+#include <linux/mutex.h>
+#include <linux/platform_device.h>
+#include <linux/types.h>
+
+#include <linux/virtio_config.h>
+#include <linux/virtio_console.h>
+#include <linux/virtio_ids.h>
+#include <linux/virtio_net.h>
+#include <linux/virtio_ring.h>
+
+#include "mlxbf-tmfifo-regs.h"
+
+/* Vring size. */
+#define MLXBF_TMFIFO_VRING_SIZE			SZ_1K
+
+/* Console Tx buffer size. */
+#define MLXBF_TMFIFO_CON_TX_BUF_SIZE		SZ_32K
+
+/* Console Tx buffer reserved space. */
+#define MLXBF_TMFIFO_CON_TX_BUF_RSV_SIZE	8
+
+/* House-keeping timer interval. */
+#define MLXBF_TMFIFO_TIMER_INTERVAL		(HZ / 10)
+
+/* Virtual devices sharing the TM FIFO. */
+#define MLXBF_TMFIFO_VDEV_MAX		(VIRTIO_ID_CONSOLE + 1)
+
+/*
+ * Reserve 1/16 of TmFifo space, so console messages are not starved by
+ * the networking traffic.
+ */
+#define MLXBF_TMFIFO_RESERVE_RATIO		16
+
+/* Message with data needs at least two words (for header & data). */
+#define MLXBF_TMFIFO_DATA_MIN_WORDS		2
+
+struct mlxbf_tmfifo;
+
+/**
+ * mlxbf_tmfifo_vring - Structure of the TmFifo virtual ring
+ * @va: virtual address of the ring
+ * @dma: dma address of the ring
+ * @vq: pointer to the virtio virtqueue
+ * @desc: current descriptor of the pending packet
+ * @desc_head: head descriptor of the pending packet
+ * @cur_len: processed length of the current descriptor
+ * @rem_len: remaining length of the pending packet
+ * @pkt_len: total length of the pending packet
+ * @next_avail: next avail descriptor id
+ * @num: vring size (number of descriptors)
+ * @align: vring alignment size
+ * @index: vring index
+ * @vdev_id: vring virtio id (VIRTIO_ID_xxx)
+ * @fifo: pointer to the tmfifo structure
+ */
+struct mlxbf_tmfifo_vring {
+	void *va;
+	dma_addr_t dma;
+	struct virtqueue *vq;
+	struct vring_desc *desc;
+	struct vring_desc *desc_head;
+	int cur_len;
+	int rem_len;
+	u32 pkt_len;
+	u16 next_avail;
+	int num;
+	int align;
+	int index;
+	int vdev_id;
+	struct mlxbf_tmfifo *fifo;
+};
+
+/* Interrupt types. */
+enum {
+	MLXBF_TM_RX_LWM_IRQ,
+	MLXBF_TM_RX_HWM_IRQ,
+	MLXBF_TM_TX_LWM_IRQ,
+	MLXBF_TM_TX_HWM_IRQ,
+	MLXBF_TM_MAX_IRQ
+};
+
+/* Ring types (Rx & Tx). */
+enum {
+	MLXBF_TMFIFO_VRING_RX,
+	MLXBF_TMFIFO_VRING_TX,
+	MLXBF_TMFIFO_VRING_MAX
+};
+
+/**
+ * mlxbf_tmfifo_vdev - Structure of the TmFifo virtual device
+ * @vdev: virtio device, in which the vdev.id.device field has the
+ *        VIRTIO_ID_xxx id to distinguish the virtual device.
+ * @status: status of the device
+ * @features: supported features of the device
+ * @vrings: array of tmfifo vrings of this device
+ * @config.cons: virtual console config -
+ *               select if vdev.id.device is VIRTIO_ID_CONSOLE
+ * @config.net: virtual network config -
+ *              select if vdev.id.device is VIRTIO_ID_NET
+ * @tx_buf: tx buffer used to buffer data before writing into the FIFO
+ */
+struct mlxbf_tmfifo_vdev {
+	struct virtio_device vdev;
+	u8 status;
+	u64 features;
+	struct mlxbf_tmfifo_vring vrings[MLXBF_TMFIFO_VRING_MAX];
+	union {
+		struct virtio_console_config cons;
+		struct virtio_net_config net;
+	} config;
+	struct circ_buf tx_buf;
+};
+
+/**
+ * mlxbf_tmfifo_irq_info - Structure of the interrupt information
+ * @fifo: pointer to the tmfifo structure
+ * @irq: interrupt number
+ * @index: index into the interrupt array
+ */
+struct mlxbf_tmfifo_irq_info {
+	struct mlxbf_tmfifo *fifo;
+	int irq;
+	int index;
+};
+
+/**
+ * mlxbf_tmfifo - Structure of the TmFifo
+ * @vdev: array of the virtual devices running over the TmFifo
+ * @lock: lock to protect the TmFifo access
+ * @rx_base: mapped register base address for the Rx FIFO
+ * @tx_base: mapped register base address for the Tx FIFO
+ * @rx_fifo_size: number of entries of the Rx FIFO
+ * @tx_fifo_size: number of entries of the Tx FIFO
+ * @pend_events: pending bits for deferred events
+ * @irq_info: interrupt information
+ * @work: work struct for deferred process
+ * @timer: background timer
+ * @vring: Tx/Rx ring
+ * @spin_lock: spin lock
+ * @is_ready: ready flag
+ */
+struct mlxbf_tmfifo {
+	struct mlxbf_tmfifo_vdev *vdev[MLXBF_TMFIFO_VDEV_MAX];
+	struct mutex lock;		/* TmFifo lock */
+	void __iomem *rx_base;
+	void __iomem *tx_base;
+	int rx_fifo_size;
+	int tx_fifo_size;
+	unsigned long pend_events;
+	struct mlxbf_tmfifo_irq_info irq_info[MLXBF_TM_MAX_IRQ];
+	struct work_struct work;
+	struct timer_list timer;
+	struct mlxbf_tmfifo_vring *vring[2];
+	spinlock_t spin_lock;		/* spin lock */
+	bool is_ready;
+};
+
+/**
+ * mlxbf_tmfifo_msg_hdr - Structure of the TmFifo message header
+ * @type: message type
+ * @len: payload length in network byte order. Messages sent into the FIFO
+ *       will be read by the other side as data stream in the same byte order.
+ *       The length needs to be encoded into network order so both sides
+ *       could understand it.
+ */
+struct mlxbf_tmfifo_msg_hdr {
+	u8 type;
+	__be16 len;
+	u8 unused[5];
+} __packed __aligned(sizeof(u64));
+
+/*
+ * Default MAC.
+ * This MAC address will be read from EFI persistent variable if configured.
+ * It can also be reconfigured with standard Linux tools.
+ */
+static u8 mlxbf_tmfifo_net_default_mac[ETH_ALEN] = {
+	0x00, 0x1A, 0xCA, 0xFF, 0xFF, 0x01
+};
+
+/* EFI variable name of the MAC address. */
+static efi_char16_t mlxbf_tmfifo_efi_name[] = L"RshimMacAddr";
+
+/* Maximum L2 header length. */
+#define MLXBF_TMFIFO_NET_L2_OVERHEAD	36
+
+/* Supported virtio-net features. */
+#define MLXBF_TMFIFO_NET_FEATURES \
+	(BIT_ULL(VIRTIO_NET_F_MTU) | BIT_ULL(VIRTIO_NET_F_STATUS) | \
+	 BIT_ULL(VIRTIO_NET_F_MAC))
+
+#define mlxbf_vdev_to_tmfifo(d) container_of(d, struct mlxbf_tmfifo_vdev, vdev)
+
+/* Free vrings of the FIFO device. */
+static void mlxbf_tmfifo_free_vrings(struct mlxbf_tmfifo *fifo,
+				     struct mlxbf_tmfifo_vdev *tm_vdev)
+{
+	struct mlxbf_tmfifo_vring *vring;
+	int i, size;
+
+	for (i = 0; i < ARRAY_SIZE(tm_vdev->vrings); i++) {
+		vring = &tm_vdev->vrings[i];
+		if (vring->va) {
+			size = vring_size(vring->num, vring->align);
+			dma_free_coherent(tm_vdev->vdev.dev.parent, size,
+					  vring->va, vring->dma);
+			vring->va = NULL;
+			if (vring->vq) {
+				vring_del_virtqueue(vring->vq);
+				vring->vq = NULL;
+			}
+		}
+	}
+}
+
+/* Allocate vrings for the FIFO. */
+static int mlxbf_tmfifo_alloc_vrings(struct mlxbf_tmfifo *fifo,
+				     struct mlxbf_tmfifo_vdev *tm_vdev)
+{
+	struct mlxbf_tmfifo_vring *vring;
+	struct device *dev;
+	dma_addr_t dma;
+	int i, size;
+	void *va;
+
+	for (i = 0; i < ARRAY_SIZE(tm_vdev->vrings); i++) {
+		vring = &tm_vdev->vrings[i];
+		vring->fifo = fifo;
+		vring->num = MLXBF_TMFIFO_VRING_SIZE;
+		vring->align = SMP_CACHE_BYTES;
+		vring->index = i;
+		vring->vdev_id = tm_vdev->vdev.id.device;
+		dev = &tm_vdev->vdev.dev;
+
+		size = vring_size(vring->num, vring->align);
+		va = dma_alloc_coherent(dev->parent, size, &dma, GFP_KERNEL);
+		if (!va) {
+			mlxbf_tmfifo_free_vrings(fifo, tm_vdev);
+			dev_err(dev->parent, "dma_alloc_coherent failed\n");
+			return -ENOMEM;
+		}
+
+		vring->va = va;
+		vring->dma = dma;
+	}
+
+	return 0;
+}
+
+/* Disable interrupts of the FIFO device. */
+static void mlxbf_tmfifo_disable_irqs(struct mlxbf_tmfifo *fifo)
+{
+	int i, irq;
+
+	for (i = 0; i < MLXBF_TM_MAX_IRQ; i++) {
+		irq = fifo->irq_info[i].irq;
+		fifo->irq_info[i].irq = 0;
+		disable_irq(irq);
+	}
+}
+
+/* Interrupt handler. */
+static irqreturn_t mlxbf_tmfifo_irq_handler(int irq, void *arg)
+{
+	struct mlxbf_tmfifo_irq_info *irq_info = arg;
+
+	if (!test_and_set_bit(irq_info->index, &irq_info->fifo->pend_events))
+		schedule_work(&irq_info->fifo->work);
+
+	return IRQ_HANDLED;
+}
+
+/* Get the next packet descriptor from the vring. */
+static struct vring_desc *
+mlxbf_tmfifo_get_next_desc(struct mlxbf_tmfifo_vring *vring)
+{
+	const struct vring *vr = virtqueue_get_vring(vring->vq);
+	struct virtio_device *vdev = vring->vq->vdev;
+	unsigned int idx, head;
+
+	if (vring->next_avail == virtio16_to_cpu(vdev, vr->avail->idx))
+		return NULL;
+
+	idx = vring->next_avail % vr->num;
+	head = virtio16_to_cpu(vdev, vr->avail->ring[idx]);
+	if (WARN_ON(head >= vr->num))
+		return NULL;
+
+	vring->next_avail++;
+
+	return &vr->desc[head];
+}
+
+/* Release virtio descriptor. */
+static void mlxbf_tmfifo_release_desc(struct mlxbf_tmfifo_vring *vring,
+				      struct vring_desc *desc, u32 len)
+{
+	const struct vring *vr = virtqueue_get_vring(vring->vq);
+	struct virtio_device *vdev = vring->vq->vdev;
+	u16 idx, vr_idx;
+
+	vr_idx = virtio16_to_cpu(vdev, vr->used->idx);
+	idx = vr_idx % vr->num;
+	vr->used->ring[idx].id = cpu_to_virtio32(vdev, desc - vr->desc);
+	vr->used->ring[idx].len = cpu_to_virtio32(vdev, len);
+
+	/*
+	 * Virtio could poll and check the 'idx' to decide whether the desc is
+	 * done or not. Add a memory barrier here to make sure the update above
+	 * completes before updating the idx.
+	 */
+	mb();
+	vr->used->idx = cpu_to_virtio16(vdev, vr_idx + 1);
+}
+
+/* Get the total length of the descriptor chain. */
+static u32 mlxbf_tmfifo_get_pkt_len(struct mlxbf_tmfifo_vring *vring,
+				    struct vring_desc *desc)
+{
+	const struct vring *vr = virtqueue_get_vring(vring->vq);
+	struct virtio_device *vdev = vring->vq->vdev;
+	u32 len = 0, idx;
+
+	while (desc) {
+		len += virtio32_to_cpu(vdev, desc->len);
+		if (!(virtio16_to_cpu(vdev, desc->flags) & VRING_DESC_F_NEXT))
+			break;
+		idx = virtio16_to_cpu(vdev, desc->next);
+		desc = &vr->desc[idx];
+	}
+
+	return len;
+}
+
+static void mlxbf_tmfifo_release_pending_pkt(struct mlxbf_tmfifo_vring *vring)
+{
+	struct vring_desc *desc_head;
+	u32 len = 0;
+
+	if (vring->desc_head) {
+		desc_head = vring->desc_head;
+		len = vring->pkt_len;
+	} else {
+		desc_head = mlxbf_tmfifo_get_next_desc(vring);
+		len = mlxbf_tmfifo_get_pkt_len(vring, desc_head);
+	}
+
+	if (desc_head)
+		mlxbf_tmfifo_release_desc(vring, desc_head, len);
+
+	vring->pkt_len = 0;
+	vring->desc = NULL;
+	vring->desc_head = NULL;
+}
+
+static void mlxbf_tmfifo_init_net_desc(struct mlxbf_tmfifo_vring *vring,
+				       struct vring_desc *desc, bool is_rx)
+{
+	struct virtio_device *vdev = vring->vq->vdev;
+	struct virtio_net_hdr *net_hdr;
+
+	net_hdr = phys_to_virt(virtio64_to_cpu(vdev, desc->addr));
+	memset(net_hdr, 0, sizeof(*net_hdr));
+}
+
+/* Get and initialize the next packet. */
+static struct vring_desc *
+mlxbf_tmfifo_get_next_pkt(struct mlxbf_tmfifo_vring *vring, bool is_rx)
+{
+	struct vring_desc *desc;
+
+	desc = mlxbf_tmfifo_get_next_desc(vring);
+	if (desc && is_rx && vring->vdev_id == VIRTIO_ID_NET)
+		mlxbf_tmfifo_init_net_desc(vring, desc, is_rx);
+
+	vring->desc_head = desc;
+	vring->desc = desc;
+
+	return desc;
+}
+
+/* House-keeping timer. */
+static void mlxbf_tmfifo_timer(struct timer_list *t)
+{
+	struct mlxbf_tmfifo *fifo = container_of(t, struct mlxbf_tmfifo, timer);
+	int rx, tx;
+
+	rx = !test_and_set_bit(MLXBF_TM_RX_HWM_IRQ, &fifo->pend_events);
+	tx = !test_and_set_bit(MLXBF_TM_TX_LWM_IRQ, &fifo->pend_events);
+
+	if (rx || tx)
+		schedule_work(&fifo->work);
+
+	mod_timer(&fifo->timer, jiffies + MLXBF_TMFIFO_TIMER_INTERVAL);
+}
+
+/* Copy one console packet into the output buffer. */
+static void mlxbf_tmfifo_console_output_one(struct mlxbf_tmfifo_vdev *cons,
+					    struct mlxbf_tmfifo_vring *vring,
+					    struct vring_desc *desc)
+{
+	const struct vring *vr = virtqueue_get_vring(vring->vq);
+	struct virtio_device *vdev = &cons->vdev;
+	u32 len, idx, seg;
+	void *addr;
+
+	while (desc) {
+		addr = phys_to_virt(virtio64_to_cpu(vdev, desc->addr));
+		len = virtio32_to_cpu(vdev, desc->len);
+
+		seg = CIRC_SPACE_TO_END(cons->tx_buf.head, cons->tx_buf.tail,
+					MLXBF_TMFIFO_CON_TX_BUF_SIZE);
+		if (len <= seg) {
+			memcpy(cons->tx_buf.buf + cons->tx_buf.head, addr, len);
+		} else {
+			memcpy(cons->tx_buf.buf + cons->tx_buf.head, addr, seg);
+			addr += seg;
+			memcpy(cons->tx_buf.buf, addr, len - seg);
+		}
+		cons->tx_buf.head = (cons->tx_buf.head + len) %
+			MLXBF_TMFIFO_CON_TX_BUF_SIZE;
+
+		if (!(virtio16_to_cpu(vdev, desc->flags) & VRING_DESC_F_NEXT))
+			break;
+		idx = virtio16_to_cpu(vdev, desc->next);
+		desc = &vr->desc[idx];
+	}
+}
+
+/* Copy console data into the output buffer. */
+static void mlxbf_tmfifo_console_output(struct mlxbf_tmfifo_vdev *cons,
+					struct mlxbf_tmfifo_vring *vring)
+{
+	struct vring_desc *desc;
+	u32 len, avail;
+
+	desc = mlxbf_tmfifo_get_next_desc(vring);
+	while (desc) {
+		/* Release the packet if not enough space. */
+		len = mlxbf_tmfifo_get_pkt_len(vring, desc);
+		avail = CIRC_SPACE(cons->tx_buf.head, cons->tx_buf.tail,
+				   MLXBF_TMFIFO_CON_TX_BUF_SIZE);
+		if (len + MLXBF_TMFIFO_CON_TX_BUF_RSV_SIZE > avail) {
+			mlxbf_tmfifo_release_desc(vring, desc, len);
+			break;
+		}
+
+		mlxbf_tmfifo_console_output_one(cons, vring, desc);
+		mlxbf_tmfifo_release_desc(vring, desc, len);
+		desc = mlxbf_tmfifo_get_next_desc(vring);
+	}
+}
+
+/* Get the number of available words in Rx FIFO for receiving. */
+static int mlxbf_tmfifo_get_rx_avail(struct mlxbf_tmfifo *fifo)
+{
+	u64 sts;
+
+	sts = readq(fifo->rx_base + MLXBF_TMFIFO_RX_STS);
+	return FIELD_GET(MLXBF_TMFIFO_RX_STS__COUNT_MASK, sts);
+}
+
+/* Get the number of available words in the TmFifo for sending. */
+static int mlxbf_tmfifo_get_tx_avail(struct mlxbf_tmfifo *fifo, int vdev_id)
+{
+	int tx_reserve;
+	u32 count;
+	u64 sts;
+
+	/* Reserve some room in FIFO for console messages. */
+	if (vdev_id == VIRTIO_ID_NET)
+		tx_reserve = fifo->tx_fifo_size / MLXBF_TMFIFO_RESERVE_RATIO;
+	else
+		tx_reserve = 1;
+
+	sts = readq(fifo->tx_base + MLXBF_TMFIFO_TX_STS);
+	count = FIELD_GET(MLXBF_TMFIFO_TX_STS__COUNT_MASK, sts);
+	return fifo->tx_fifo_size - tx_reserve - count;
+}
+
+/* Console Tx (move data from the output buffer into the TmFifo). */
+static void mlxbf_tmfifo_console_tx(struct mlxbf_tmfifo *fifo, int avail)
+{
+	struct mlxbf_tmfifo_msg_hdr hdr;
+	struct mlxbf_tmfifo_vdev *cons;
+	unsigned long flags;
+	int size, seg;
+	void *addr;
+	u64 data;
+
+	/* Return if not enough space available. */
+	if (avail < MLXBF_TMFIFO_DATA_MIN_WORDS)
+		return;
+
+	cons = fifo->vdev[VIRTIO_ID_CONSOLE];
+	if (!cons || !cons->tx_buf.buf)
+		return;
+
+	/* Return if no data to send. */
+	size = CIRC_CNT(cons->tx_buf.head, cons->tx_buf.tail,
+			MLXBF_TMFIFO_CON_TX_BUF_SIZE);
+	if (size == 0)
+		return;
+
+	/* Adjust the size to available space. */
+	if (size + sizeof(hdr) > avail * sizeof(u64))
+		size = avail * sizeof(u64) - sizeof(hdr);
+
+	/* Write header. */
+	hdr.type = VIRTIO_ID_CONSOLE;
+	hdr.len = htons(size);
+	writeq(*(u64 *)&hdr, fifo->tx_base + MLXBF_TMFIFO_TX_DATA);
+
+	/* Use spin-lock to protect the 'cons->tx_buf'. */
+	spin_lock_irqsave(&fifo->spin_lock, flags);
+
+	while (size > 0) {
+		addr = cons->tx_buf.buf + cons->tx_buf.tail;
+
+		seg = CIRC_CNT_TO_END(cons->tx_buf.head, cons->tx_buf.tail,
+				      MLXBF_TMFIFO_CON_TX_BUF_SIZE);
+		if (seg >= sizeof(u64)) {
+			memcpy(&data, addr, sizeof(u64));
+		} else {
+			memcpy(&data, addr, seg);
+			memcpy((u8 *)&data + seg, cons->tx_buf.buf,
+			       sizeof(u64) - seg);
+		}
+		writeq(data, fifo->tx_base + MLXBF_TMFIFO_TX_DATA);
+
+		if (size >= sizeof(u64)) {
+			cons->tx_buf.tail = (cons->tx_buf.tail + sizeof(u64)) %
+				MLXBF_TMFIFO_CON_TX_BUF_SIZE;
+			size -= sizeof(u64);
+		} else {
+			cons->tx_buf.tail = (cons->tx_buf.tail + size) %
+				MLXBF_TMFIFO_CON_TX_BUF_SIZE;
+			size = 0;
+		}
+	}
+
+	spin_unlock_irqrestore(&fifo->spin_lock, flags);
+}
+
+/* Rx/Tx one word in the descriptor buffer. */
+static void mlxbf_tmfifo_rxtx_word(struct mlxbf_tmfifo_vring *vring,
+				   struct vring_desc *desc,
+				   bool is_rx, int len)
+{
+	struct virtio_device *vdev = vring->vq->vdev;
+	struct mlxbf_tmfifo *fifo = vring->fifo;
+	void *addr;
+	u64 data;
+
+	/* Get the buffer address of this desc. */
+	addr = phys_to_virt(virtio64_to_cpu(vdev, desc->addr));
+
+	/* Read a word from FIFO for Rx. */
+	if (is_rx)
+		data = readq(fifo->rx_base + MLXBF_TMFIFO_RX_DATA);
+
+	if (vring->cur_len + sizeof(u64) <= len) {
+		/* The whole word. */
+		if (is_rx)
+			memcpy(addr + vring->cur_len, &data, sizeof(u64));
+		else
+			memcpy(&data, addr + vring->cur_len, sizeof(u64));
+		vring->cur_len += sizeof(u64);
+	} else {
+		/* Leftover bytes. */
+		if (is_rx)
+			memcpy(addr + vring->cur_len, &data,
+			       len - vring->cur_len);
+		else
+			memcpy(&data, addr + vring->cur_len,
+			       len - vring->cur_len);
+		vring->cur_len = len;
+	}
+
+	/* Write the word into FIFO for Tx. */
+	if (!is_rx)
+		writeq(data, fifo->tx_base + MLXBF_TMFIFO_TX_DATA);
+}
+
+/*
+ * Rx/Tx packet header.
+ *
+ * In Rx case, the packet might be found to belong to a different vring since
+ * the TmFifo is shared by different services. In such case, the 'vring_change'
+ * flag is set.
+ */
+static void mlxbf_tmfifo_rxtx_header(struct mlxbf_tmfifo_vring *vring,
+				     struct vring_desc *desc,
+				     bool is_rx, bool *vring_change)
+{
+	struct mlxbf_tmfifo *fifo = vring->fifo;
+	struct virtio_net_config *config;
+	struct mlxbf_tmfifo_msg_hdr hdr;
+	int vdev_id, hdr_len;
+
+	/* Read/Write packet header. */
+	if (is_rx) {
+		/* Drain one word from the FIFO. */
+		*(u64 *)&hdr = readq(fifo->rx_base + MLXBF_TMFIFO_RX_DATA);
+
+		/* Skip the length 0 packets (keepalive). */
+		if (hdr.len == 0)
+			return;
+
+		/* Check packet type. */
+		if (hdr.type == VIRTIO_ID_NET) {
+			vdev_id = VIRTIO_ID_NET;
+			hdr_len = sizeof(struct virtio_net_hdr);
+			config = &fifo->vdev[vdev_id]->config.net;
+			if (ntohs(hdr.len) > config->mtu +
+			    MLXBF_TMFIFO_NET_L2_OVERHEAD)
+				return;
+		} else {
+			vdev_id = VIRTIO_ID_CONSOLE;
+			hdr_len = 0;
+		}
+
+		/*
+		 * Check whether the new packet still belongs to this vring.
+		 * If not, update the pkt_len of the new vring.
+		 */
+		if (vdev_id != vring->vdev_id) {
+			struct mlxbf_tmfifo_vdev *tm_dev2 = fifo->vdev[vdev_id];
+
+			if (!tm_dev2)
+				return;
+			vring->desc = desc;
+			vring = &tm_dev2->vrings[MLXBF_TMFIFO_VRING_RX];
+			*vring_change = true;
+		}
+		vring->pkt_len = ntohs(hdr.len) + hdr_len;
+	} else {
+		/* Network virtio has an extra header. */
+		hdr_len = (vring->vdev_id == VIRTIO_ID_NET) ?
+			   sizeof(struct virtio_net_hdr) : 0;
+		vring->pkt_len = mlxbf_tmfifo_get_pkt_len(vring, desc);
+		hdr.type = (vring->vdev_id == VIRTIO_ID_NET) ?
+			    VIRTIO_ID_NET : VIRTIO_ID_CONSOLE;
+		hdr.len = htons(vring->pkt_len - hdr_len);
+		writeq(*(u64 *)&hdr, fifo->tx_base + MLXBF_TMFIFO_TX_DATA);
+	}
+
+	vring->cur_len = hdr_len;
+	vring->rem_len = vring->pkt_len;
+	fifo->vring[is_rx] = vring;
+}
+
+/*
+ * Rx/Tx one descriptor.
+ *
+ * Return true to indicate more data available.
+ */
+static bool mlxbf_tmfifo_rxtx_one_desc(struct mlxbf_tmfifo_vring *vring,
+				       bool is_rx, int *avail)
+{
+	const struct vring *vr = virtqueue_get_vring(vring->vq);
+	struct mlxbf_tmfifo *fifo = vring->fifo;
+	struct virtio_device *vdev;
+	bool vring_change = false;
+	struct vring_desc *desc;
+	unsigned long flags;
+	u32 len, idx;
+
+	vdev = &fifo->vdev[vring->vdev_id]->vdev;
+
+	/* Get the descriptor of the next packet. */
+	if (!vring->desc) {
+		desc = mlxbf_tmfifo_get_next_pkt(vring, is_rx);
+		if (!desc)
+			return false;
+	} else {
+		desc = vring->desc;
+	}
+
+	/* Beginning of a packet. Start to Rx/Tx packet header. */
+	if (vring->pkt_len == 0) {
+		mlxbf_tmfifo_rxtx_header(vring, desc, is_rx, &vring_change);
+		(*avail)--;
+
+		/* Return if new packet is for another ring. */
+		if (vring_change)
+			return false;
+		goto mlxbf_tmfifo_desc_done;
+	}
+
+	/* Get the length of this desc. */
+	len = virtio32_to_cpu(vdev, desc->len);
+	if (len > vring->rem_len)
+		len = vring->rem_len;
+
+	/* Rx/Tx one word (8 bytes) if not done. */
+	if (vring->cur_len < len) {
+		mlxbf_tmfifo_rxtx_word(vring, desc, is_rx, len);
+		(*avail)--;
+	}
+
+	/* Check again whether it's done. */
+	if (vring->cur_len == len) {
+		vring->cur_len = 0;
+		vring->rem_len -= len;
+
+		/* Get the next desc on the chain. */
+		if (vring->rem_len > 0 &&
+		    (virtio16_to_cpu(vdev, desc->flags) & VRING_DESC_F_NEXT)) {
+			idx = virtio16_to_cpu(vdev, desc->next);
+			desc = &vr->desc[idx];
+			goto mlxbf_tmfifo_desc_done;
+		}
+
+		/* Done and release the pending packet. */
+		mlxbf_tmfifo_release_pending_pkt(vring);
+		desc = NULL;
+		fifo->vring[is_rx] = NULL;
+
+		/* Notify upper layer that packet is done. */
+		spin_lock_irqsave(&fifo->spin_lock, flags);
+		vring_interrupt(0, vring->vq);
+		spin_unlock_irqrestore(&fifo->spin_lock, flags);
+	}
+
+mlxbf_tmfifo_desc_done:
+	/* Save the current desc. */
+	vring->desc = desc;
+
+	return true;
+}
+
+/* Rx & Tx processing of a queue. */
+static void mlxbf_tmfifo_rxtx(struct mlxbf_tmfifo_vring *vring, bool is_rx)
+{
+	int avail = 0, devid = vring->vdev_id;
+	struct mlxbf_tmfifo *fifo;
+	bool more;
+
+	fifo = vring->fifo;
+
+	/* Return if vdev is not ready. */
+	if (!fifo->vdev[devid])
+		return;
+
+	/* Return if another vring is running. */
+	if (fifo->vring[is_rx] && fifo->vring[is_rx] != vring)
+		return;
+
+	/* Only handle console and network for now. */
+	if (WARN_ON(devid != VIRTIO_ID_NET && devid != VIRTIO_ID_CONSOLE))
+		return;
+
+	do {
+		/* Get available FIFO space. */
+		if (avail == 0) {
+			if (is_rx)
+				avail = mlxbf_tmfifo_get_rx_avail(fifo);
+			else
+				avail = mlxbf_tmfifo_get_tx_avail(fifo, devid);
+			if (avail <= 0)
+				break;
+		}
+
+		/* Console output always comes from the Tx buffer. */
+		if (!is_rx && devid == VIRTIO_ID_CONSOLE) {
+			mlxbf_tmfifo_console_tx(fifo, avail);
+			break;
+		}
+
+		/* Handle one descriptor. */
+		more = mlxbf_tmfifo_rxtx_one_desc(vring, is_rx, &avail);
+	} while (more);
+}
+
+/* Handle Rx or Tx queues. */
+static void mlxbf_tmfifo_work_rxtx(struct mlxbf_tmfifo *fifo, int queue_id,
+				   int irq_id, bool is_rx)
+{
+	struct mlxbf_tmfifo_vdev *tm_vdev;
+	struct mlxbf_tmfifo_vring *vring;
+	int i;
+
+	if (!test_and_clear_bit(irq_id, &fifo->pend_events) ||
+	    !fifo->irq_info[irq_id].irq)
+		return;
+
+	for (i = 0; i < MLXBF_TMFIFO_VDEV_MAX; i++) {
+		tm_vdev = fifo->vdev[i];
+		if (tm_vdev) {
+			vring = &tm_vdev->vrings[queue_id];
+			if (vring->vq)
+				mlxbf_tmfifo_rxtx(vring, is_rx);
+		}
+	}
+}
+
+/* Work handler for Rx and Tx case. */
+static void mlxbf_tmfifo_work_handler(struct work_struct *work)
+{
+	struct mlxbf_tmfifo *fifo;
+
+	fifo = container_of(work, struct mlxbf_tmfifo, work);
+	if (!fifo->is_ready)
+		return;
+
+	mutex_lock(&fifo->lock);
+
+	/* Tx (Send data to the TmFifo). */
+	mlxbf_tmfifo_work_rxtx(fifo, MLXBF_TMFIFO_VRING_TX,
+			       MLXBF_TM_TX_LWM_IRQ, false);
+
+	/* Rx (Receive data from the TmFifo). */
+	mlxbf_tmfifo_work_rxtx(fifo, MLXBF_TMFIFO_VRING_RX,
+			       MLXBF_TM_RX_HWM_IRQ, true);
+
+	mutex_unlock(&fifo->lock);
+}
+
+/* The notify function is called when new buffers are posted. */
+static bool mlxbf_tmfifo_virtio_notify(struct virtqueue *vq)
+{
+	struct mlxbf_tmfifo_vring *vring = vq->priv;
+	struct mlxbf_tmfifo_vdev *tm_vdev;
+	struct mlxbf_tmfifo *fifo;
+	unsigned long flags;
+
+	fifo = vring->fifo;
+
+	/*
+	 * Virtio maintains vrings in pairs, even number ring for Rx
+	 * and odd number ring for Tx.
+	 */
+	if (vring->index & BIT(0)) {
+		/*
+		 * Console could make blocking call with interrupts disabled.
+		 * In such case, the vring needs to be served right away. For
+		 * other cases, just set the TX LWM bit to start Tx in the
+		 * worker handler.
+		 */
+		if (vring->vdev_id == VIRTIO_ID_CONSOLE) {
+			spin_lock_irqsave(&fifo->spin_lock, flags);
+			tm_vdev = fifo->vdev[VIRTIO_ID_CONSOLE];
+			mlxbf_tmfifo_console_output(tm_vdev, vring);
+			spin_unlock_irqrestore(&fifo->spin_lock, flags);
+		} else if (test_and_set_bit(MLXBF_TM_TX_LWM_IRQ,
+					    &fifo->pend_events)) {
+			return true;
+		}
+	} else {
+		if (test_and_set_bit(MLXBF_TM_RX_HWM_IRQ, &fifo->pend_events))
+			return true;
+	}
+
+	schedule_work(&fifo->work);
+
+	return true;
+}
+
+/* Get the array of feature bits for this device. */
+static u64 mlxbf_tmfifo_virtio_get_features(struct virtio_device *vdev)
+{
+	struct mlxbf_tmfifo_vdev *tm_vdev = mlxbf_vdev_to_tmfifo(vdev);
+
+	return tm_vdev->features;
+}
+
+/* Confirm device features to use. */
+static int mlxbf_tmfifo_virtio_finalize_features(struct virtio_device *vdev)
+{
+	struct mlxbf_tmfifo_vdev *tm_vdev = mlxbf_vdev_to_tmfifo(vdev);
+
+	tm_vdev->features = vdev->features;
+
+	return 0;
+}
+
+/* Free virtqueues found by find_vqs(). */
+static void mlxbf_tmfifo_virtio_del_vqs(struct virtio_device *vdev)
+{
+	struct mlxbf_tmfifo_vdev *tm_vdev = mlxbf_vdev_to_tmfifo(vdev);
+	struct mlxbf_tmfifo_vring *vring;
+	struct virtqueue *vq;
+	int i;
+
+	for (i = 0; i < ARRAY_SIZE(tm_vdev->vrings); i++) {
+		vring = &tm_vdev->vrings[i];
+
+		/* Release the pending packet. */
+		if (vring->desc)
+			mlxbf_tmfifo_release_pending_pkt(vring);
+		vq = vring->vq;
+		if (vq) {
+			vring->vq = NULL;
+			vring_del_virtqueue(vq);
+		}
+	}
+}
+
+/* Create and initialize the virtual queues. */
+static int mlxbf_tmfifo_virtio_find_vqs(struct virtio_device *vdev,
+					unsigned int nvqs,
+					struct virtqueue *vqs[],
+					vq_callback_t *callbacks[],
+					const char * const names[],
+					const bool *ctx,
+					struct irq_affinity *desc)
+{
+	struct mlxbf_tmfifo_vdev *tm_vdev = mlxbf_vdev_to_tmfifo(vdev);
+	struct mlxbf_tmfifo_vring *vring;
+	struct virtqueue *vq;
+	int i, ret, size;
+
+	if (nvqs > ARRAY_SIZE(tm_vdev->vrings))
+		return -EINVAL;
+
+	for (i = 0; i < nvqs; ++i) {
+		if (!names[i]) {
+			ret = -EINVAL;
+			goto error;
+		}
+		vring = &tm_vdev->vrings[i];
+
+		/* zero vring */
+		size = vring_size(vring->num, vring->align);
+		memset(vring->va, 0, size);
+		vq = vring_new_virtqueue(i, vring->num, vring->align, vdev,
+					 false, false, vring->va,
+					 mlxbf_tmfifo_virtio_notify,
+					 callbacks[i], names[i]);
+		if (!vq) {
+			dev_err(&vdev->dev, "vring_new_virtqueue failed\n");
+			ret = -ENOMEM;
+			goto error;
+		}
+
+		vqs[i] = vq;
+		vring->vq = vq;
+		vq->priv = vring;
+	}
+
+	return 0;
+
+error:
+	mlxbf_tmfifo_virtio_del_vqs(vdev);
+	return ret;
+}
+
+/* Read the status byte. */
+static u8 mlxbf_tmfifo_virtio_get_status(struct virtio_device *vdev)
+{
+	struct mlxbf_tmfifo_vdev *tm_vdev = mlxbf_vdev_to_tmfifo(vdev);
+
+	return tm_vdev->status;
+}
+
+/* Write the status byte. */
+static void mlxbf_tmfifo_virtio_set_status(struct virtio_device *vdev,
+					   u8 status)
+{
+	struct mlxbf_tmfifo_vdev *tm_vdev = mlxbf_vdev_to_tmfifo(vdev);
+
+	tm_vdev->status = status;
+}
+
+/* Reset the device. Not much here for now. */
+static void mlxbf_tmfifo_virtio_reset(struct virtio_device *vdev)
+{
+	struct mlxbf_tmfifo_vdev *tm_vdev = mlxbf_vdev_to_tmfifo(vdev);
+
+	tm_vdev->status = 0;
+}
+
+/* Read the value of a configuration field. */
+static void mlxbf_tmfifo_virtio_get(struct virtio_device *vdev,
+				    unsigned int offset,
+				    void *buf,
+				    unsigned int len)
+{
+	struct mlxbf_tmfifo_vdev *tm_vdev = mlxbf_vdev_to_tmfifo(vdev);
+
+	if ((u64)offset + len > sizeof(tm_vdev->config))
+		return;
+
+	memcpy(buf, (u8 *)&tm_vdev->config + offset, len);
+}
+
+/* Write the value of a configuration field. */
+static void mlxbf_tmfifo_virtio_set(struct virtio_device *vdev,
+				    unsigned int offset,
+				    const void *buf,
+				    unsigned int len)
+{
+	struct mlxbf_tmfifo_vdev *tm_vdev = mlxbf_vdev_to_tmfifo(vdev);
+
+	if ((u64)offset + len > sizeof(tm_vdev->config))
+		return;
+
+	memcpy((u8 *)&tm_vdev->config + offset, buf, len);
+}
+
+static void tmfifo_virtio_dev_release(struct device *device)
+{
+	struct virtio_device *vdev =
+			container_of(device, struct virtio_device, dev);
+	struct mlxbf_tmfifo_vdev *tm_vdev = mlxbf_vdev_to_tmfifo(vdev);
+
+	kfree(tm_vdev);
+}
+
+/* Virtio config operations. */
+static const struct virtio_config_ops mlxbf_tmfifo_virtio_config_ops = {
+	.get_features = mlxbf_tmfifo_virtio_get_features,
+	.finalize_features = mlxbf_tmfifo_virtio_finalize_features,
+	.find_vqs = mlxbf_tmfifo_virtio_find_vqs,
+	.del_vqs = mlxbf_tmfifo_virtio_del_vqs,
+	.reset = mlxbf_tmfifo_virtio_reset,
+	.set_status = mlxbf_tmfifo_virtio_set_status,
+	.get_status = mlxbf_tmfifo_virtio_get_status,
+	.get = mlxbf_tmfifo_virtio_get,
+	.set = mlxbf_tmfifo_virtio_set,
+};
+
+/* Create vdev for the FIFO. */
+static int mlxbf_tmfifo_create_vdev(struct device *dev,
+				    struct mlxbf_tmfifo *fifo,
+				    int vdev_id, u64 features,
+				    void *config, u32 size)
+{
+	struct mlxbf_tmfifo_vdev *tm_vdev, *reg_dev = NULL;
+	int ret;
+
+	mutex_lock(&fifo->lock);
+
+	tm_vdev = fifo->vdev[vdev_id];
+	if (tm_vdev) {
+		dev_err(dev, "vdev %d already exists\n", vdev_id);
+		ret = -EEXIST;
+		goto fail;
+	}
+
+	tm_vdev = kzalloc(sizeof(*tm_vdev), GFP_KERNEL);
+	if (!tm_vdev) {
+		ret = -ENOMEM;
+		goto fail;
+	}
+
+	tm_vdev->vdev.id.device = vdev_id;
+	tm_vdev->vdev.config = &mlxbf_tmfifo_virtio_config_ops;
+	tm_vdev->vdev.dev.parent = dev;
+	tm_vdev->vdev.dev.release = tmfifo_virtio_dev_release;
+	tm_vdev->features = features;
+	if (config)
+		memcpy(&tm_vdev->config, config, size);
+
+	if (mlxbf_tmfifo_alloc_vrings(fifo, tm_vdev)) {
+		dev_err(dev, "unable to allocate vring\n");
+		ret = -ENOMEM;
+		goto vdev_fail;
+	}
+
+	/* Allocate an output buffer for the console device. */
+	if (vdev_id == VIRTIO_ID_CONSOLE)
+		tm_vdev->tx_buf.buf = devm_kmalloc(dev,
+						   MLXBF_TMFIFO_CON_TX_BUF_SIZE,
+						   GFP_KERNEL);
+	fifo->vdev[vdev_id] = tm_vdev;
+
+	/* Register the virtio device. */
+	ret = register_virtio_device(&tm_vdev->vdev);
+	reg_dev = tm_vdev;
+	if (ret) {
+		dev_err(dev, "register_virtio_device failed\n");
+		goto vdev_fail;
+	}
+
+	mutex_unlock(&fifo->lock);
+	return 0;
+
+vdev_fail:
+	mlxbf_tmfifo_free_vrings(fifo, tm_vdev);
+	fifo->vdev[vdev_id] = NULL;
+	if (reg_dev)
+		put_device(&tm_vdev->vdev.dev);
+	else
+		kfree(tm_vdev);
+fail:
+	mutex_unlock(&fifo->lock);
+	return ret;
+}
+
+/* Delete vdev for the FIFO. */
+static int mlxbf_tmfifo_delete_vdev(struct mlxbf_tmfifo *fifo, int vdev_id)
+{
+	struct mlxbf_tmfifo_vdev *tm_vdev;
+
+	mutex_lock(&fifo->lock);
+
+	/* Unregister vdev. */
+	tm_vdev = fifo->vdev[vdev_id];
+	if (tm_vdev) {
+		unregister_virtio_device(&tm_vdev->vdev);
+		mlxbf_tmfifo_free_vrings(fifo, tm_vdev);
+		fifo->vdev[vdev_id] = NULL;
+	}
+
+	mutex_unlock(&fifo->lock);
+
+	return 0;
+}
+
+/* Read the configured network MAC address from efi variable. */
+static void mlxbf_tmfifo_get_cfg_mac(u8 *mac)
+{
+	efi_guid_t guid = EFI_GLOBAL_VARIABLE_GUID;
+	unsigned long size = ETH_ALEN;
+	u8 buf[ETH_ALEN];
+	efi_status_t rc;
+
+	rc = efi.get_variable(mlxbf_tmfifo_efi_name, &guid, NULL, &size, buf);
+	if (rc == EFI_SUCCESS && size == ETH_ALEN)
+		ether_addr_copy(mac, buf);
+	else
+		ether_addr_copy(mac, mlxbf_tmfifo_net_default_mac);
+}
+
+/* Set TmFifo thresolds which is used to trigger interrupts. */
+static void mlxbf_tmfifo_set_threshold(struct mlxbf_tmfifo *fifo)
+{
+	u64 ctl;
+
+	/* Get Tx FIFO size and set the low/high watermark. */
+	ctl = readq(fifo->tx_base + MLXBF_TMFIFO_TX_CTL);
+	fifo->tx_fifo_size =
+		FIELD_GET(MLXBF_TMFIFO_TX_CTL__MAX_ENTRIES_MASK, ctl);
+	ctl = (ctl & ~MLXBF_TMFIFO_TX_CTL__LWM_MASK) |
+		FIELD_PREP(MLXBF_TMFIFO_TX_CTL__LWM_MASK,
+			   fifo->tx_fifo_size / 2);
+	ctl = (ctl & ~MLXBF_TMFIFO_TX_CTL__HWM_MASK) |
+		FIELD_PREP(MLXBF_TMFIFO_TX_CTL__HWM_MASK,
+			   fifo->tx_fifo_size - 1);
+	writeq(ctl, fifo->tx_base + MLXBF_TMFIFO_TX_CTL);
+
+	/* Get Rx FIFO size and set the low/high watermark. */
+	ctl = readq(fifo->rx_base + MLXBF_TMFIFO_RX_CTL);
+	fifo->rx_fifo_size =
+		FIELD_GET(MLXBF_TMFIFO_RX_CTL__MAX_ENTRIES_MASK, ctl);
+	ctl = (ctl & ~MLXBF_TMFIFO_RX_CTL__LWM_MASK) |
+		FIELD_PREP(MLXBF_TMFIFO_RX_CTL__LWM_MASK, 0);
+	ctl = (ctl & ~MLXBF_TMFIFO_RX_CTL__HWM_MASK) |
+		FIELD_PREP(MLXBF_TMFIFO_RX_CTL__HWM_MASK, 1);
+	writeq(ctl, fifo->rx_base + MLXBF_TMFIFO_RX_CTL);
+}
+
+static void mlxbf_tmfifo_cleanup(struct mlxbf_tmfifo *fifo)
+{
+	int i;
+
+	fifo->is_ready = false;
+	del_timer_sync(&fifo->timer);
+	mlxbf_tmfifo_disable_irqs(fifo);
+	cancel_work_sync(&fifo->work);
+	for (i = 0; i < MLXBF_TMFIFO_VDEV_MAX; i++)
+		mlxbf_tmfifo_delete_vdev(fifo, i);
+}
+
+/* Probe the TMFIFO. */
+static int mlxbf_tmfifo_probe(struct platform_device *pdev)
+{
+	struct virtio_net_config net_config;
+	struct device *dev = &pdev->dev;
+	struct mlxbf_tmfifo *fifo;
+	int i, rc;
+
+	fifo = devm_kzalloc(dev, sizeof(*fifo), GFP_KERNEL);
+	if (!fifo)
+		return -ENOMEM;
+
+	spin_lock_init(&fifo->spin_lock);
+	INIT_WORK(&fifo->work, mlxbf_tmfifo_work_handler);
+	mutex_init(&fifo->lock);
+
+	/* Get the resource of the Rx FIFO. */
+	fifo->rx_base = devm_platform_ioremap_resource(pdev, 0);
+	if (IS_ERR(fifo->rx_base))
+		return PTR_ERR(fifo->rx_base);
+
+	/* Get the resource of the Tx FIFO. */
+	fifo->tx_base = devm_platform_ioremap_resource(pdev, 1);
+	if (IS_ERR(fifo->tx_base))
+		return PTR_ERR(fifo->tx_base);
+
+	platform_set_drvdata(pdev, fifo);
+
+	timer_setup(&fifo->timer, mlxbf_tmfifo_timer, 0);
+
+	for (i = 0; i < MLXBF_TM_MAX_IRQ; i++) {
+		fifo->irq_info[i].index = i;
+		fifo->irq_info[i].fifo = fifo;
+		fifo->irq_info[i].irq = platform_get_irq(pdev, i);
+		rc = devm_request_irq(dev, fifo->irq_info[i].irq,
+				      mlxbf_tmfifo_irq_handler, 0,
+				      "tmfifo", &fifo->irq_info[i]);
+		if (rc) {
+			dev_err(dev, "devm_request_irq failed\n");
+			fifo->irq_info[i].irq = 0;
+			return rc;
+		}
+	}
+
+	mlxbf_tmfifo_set_threshold(fifo);
+
+	/* Create the console vdev. */
+	rc = mlxbf_tmfifo_create_vdev(dev, fifo, VIRTIO_ID_CONSOLE, 0, NULL, 0);
+	if (rc)
+		goto fail;
+
+	/* Create the network vdev. */
+	memset(&net_config, 0, sizeof(net_config));
+	net_config.mtu = ETH_DATA_LEN;
+	net_config.status = VIRTIO_NET_S_LINK_UP;
+	mlxbf_tmfifo_get_cfg_mac(net_config.mac);
+	rc = mlxbf_tmfifo_create_vdev(dev, fifo, VIRTIO_ID_NET,
+				      MLXBF_TMFIFO_NET_FEATURES, &net_config,
+				      sizeof(net_config));
+	if (rc)
+		goto fail;
+
+	mod_timer(&fifo->timer, jiffies + MLXBF_TMFIFO_TIMER_INTERVAL);
+
+	fifo->is_ready = true;
+	return 0;
+
+fail:
+	mlxbf_tmfifo_cleanup(fifo);
+	return rc;
+}
+
+/* Device remove function. */
+static int mlxbf_tmfifo_remove(struct platform_device *pdev)
+{
+	struct mlxbf_tmfifo *fifo = platform_get_drvdata(pdev);
+
+	mlxbf_tmfifo_cleanup(fifo);
+
+	return 0;
+}
+
+static const struct acpi_device_id mlxbf_tmfifo_acpi_match[] = {
+	{ "MLNXBF01", 0 },
+	{}
+};
+MODULE_DEVICE_TABLE(acpi, mlxbf_tmfifo_acpi_match);
+
+static struct platform_driver mlxbf_tmfifo_driver = {
+	.probe = mlxbf_tmfifo_probe,
+	.remove = mlxbf_tmfifo_remove,
+	.driver = {
+		.name = "bf-tmfifo",
+		.acpi_match_table = mlxbf_tmfifo_acpi_match,
+	},
+};
+
+module_platform_driver(mlxbf_tmfifo_driver);
+
+MODULE_DESCRIPTION("Mellanox BlueField SoC TmFifo Driver");
+MODULE_LICENSE("GPL v2");
+MODULE_AUTHOR("Mellanox Technologies");
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [PATCH v16] platform/mellanox: Add TmFifo driver for Mellanox BlueField Soc
  2019-05-03 13:49 ` [PATCH v16] " Liming Sun
@ 2019-05-06  9:13   ` Andy Shevchenko
  0 siblings, 0 replies; 30+ messages in thread
From: Andy Shevchenko @ 2019-05-06  9:13 UTC (permalink / raw)
  To: Liming Sun
  Cc: David Woods, Andy Shevchenko, Darren Hart, Vadim Pasternak,
	Linux Kernel Mailing List, Platform Driver

On Fri, May 3, 2019 at 4:49 PM Liming Sun <lsun@mellanox.com> wrote:
>
> This commit adds the TmFifo platform driver for Mellanox BlueField
> Soc. TmFifo is a shared FIFO which enables external host machine
> to exchange data with the SoC via USB or PCIe. The driver is based
> on virtio framework and has console and network access enabled.
>

Pushed to my review and testing queue, thanks!

> Reviewed-by: Vadim Pasternak <vadimp@mellanox.com>
> Signed-off-by: Liming Sun <lsun@mellanox.com>
> ---
> v15->v16:
>     Rebase and resubmit (no new changes).
> v14->v15:
>     Fixes for comments from Andy:
>     - Remove the 'union' definition of mlxbf_tmfifo_msg_hdr and use
>       on-the-fly conversion when sending the 8-byte message header
>       into the FIFO;
>     - Update comment of mlxbf_tmfifo_msg_hdr explaining why '__be16'
>       is needed for the 'len' field. The SoC sends data stream into
>       the FIFO and the other side reads it. The byte order of the data
>       stream (byte-stream) stays the same. The 'len' field is encoded
>       into network byte order so upper-level applications in external
>       host machine with different endianness could decode it. This
>       implementation was verified over USB with an external PPC host
>       machine running in big-endian mode.
>     - Move the 'dev_err()' line to the end of the block in function
>       mlxbf_tmfifo_alloc_vrings();
>     - Remove the 'irq_info->index < MLXBF_TM_MAX_IRQ' check in
>       mlxbf_tmfifo_irq_handler() since it's unnecessary;
>     - Remove the 'if (desc_head)' check in
>       mlxbf_tmfifo_release_pending_pkt() since function
>       mlxbf_tmfifo_get_pkt_len() is already NULL-aware;
>     - Adjust the testing order of 'if (!(vring->index & BIT(0)))'
>       in bool mlxbf_tmfifo_virtio_notify() to test the positive case
>       'if (vring->index & BIT(0))' first;
>     - Add '(u64)offset' conversion in mlxbf_tmfifo_virtio_get() to
>       avoid 32-bit length addition overflow;
>     - Update the 'efi.get_variable' statement into single line in
>       mlxbf_tmfifo_get_cfg_mac();
>     - Use new helper devm_platform_ioremap_resource() to replace
>       'platform_get_resource() + devm_ioremap_resource()' in
>       mlxbf_tmfifo_probe();
> v13->v14:
>     Fixes for comments from Andy:
>     - Add a blank line to separate the virtio header files;
>     - Update the comment for 'union mlxbf_tmfifo_msg_hdr' to be
>       more clear how this union is used;
>     - Update the 'mlxbf_tmfifo_net_default_mac[ETH_ALEN]' definition
>       to be two lines;
>     - Reformat macro MLXBF_TMFIFO_NET_FEATURES to put the definition
>       in a seperate line;
>     - Update all 'fifo' to 'FIFO' in the comments;
>     - Update mlxbf_tmfifo_alloc_vrings() to specifically release the
>       allocated entries in case of failures, so the logic looks more
>       clear. In the caller function the mlxbf_tmfifo_free_vrings()
>       might be called again in case of other failures, which is ok
>       since the 'va' pointer will be set to NULL once released;
>     - Update mlxbf_tmfifo_timer() to change the first statement to
>       one line;
>     - Update one memcpy() to ether_addr_copy() in
>       mlxbf_tmfifo_get_cfg_mac();
>     - Remove 'fifo->pdev' since it is really not needed;
>     - Define temporary variable to update the mlxbf_tmfifo_create_vdev()
>       statement into single line.
>     New changes by Liming:
>     - Reorder the logic a little bit in mlxbf_tmfifo_timer(). Previously
>       it has logic like "!a || !b" while the '!b' will not be evaluated
>       if '!a' is true. It was changed to this way during review, but is
>       actually not the desired behavior since both bits need to be
>       tested/set in fifo->pend_events. This issue was found during
>       verification which caused extra delays for Tx packets.
> v12->v13:
>     Rebase and resubmit (no new changes).
> v11->v12:
>     Fixed the two unsolved comments from v11.
>     - "Change macro mlxbf_vdev_to_tmfifo() to one line"
>       Done. Seems not hard.
>     - "Is it appropriate use of devm_* for 'tm_vdev = devm_kzalloc'"
>       Yes, understand the comment now. The tmfifo is fixed, but the
>       vdev is dynamic. Use kzalloc() instead, and free the device
>       in the release callback which is the right place for it.
> v10->v11:
>     Fixes for comments from Andy:
>     - Use GENMASK_ULL() instead of GENMASK() in mlxbf-tmfifo-regs.h
>     - Removed the cpu_to_le64()/le64_to_cpu() conversion since
>       readq()/writeq() already takes care of it.
>     - Remove the "if (irq)" check in mlxbf_tmfifo_disable_irqs().
>     - Add "u32 count" temp variable in mlxbf_tmfifo_get_tx_avail().
>     - Clean up mlxbf_tmfifo_get_cfg_mac(), use ETH_ALEN instead of
>       value 6.
>     - Change the tx_buf to use Linux existing 'struct circ_buf'.
>     Comment not applied:
>     - "Change macro mlxbf_vdev_to_tmfifo() to one line"
>       Couldn't fit in one line with 80 chracters
>     - "Is it appropriate use of devm_* for 'tm_vdev = devm_kzalloc'"
>       This is SoC, the device won't be closed or detached.
>       The only case is when the driver is unloaded. So it appears
>       ok to use devm_kzalloc() since it's allocated during probe()
>       and released during module unload.
>     Comments from Vadim: OK
> v9->v10:
>     Fixes for comments from Andy:
>     - Use devm_ioremap_resource() instead of devm_ioremap().
>     - Use kernel-doc comments.
>     - Keep Makefile contents sorted.
>     - Use same fixed format for offsets.
>     - Use SZ_1K/SZ_32K instead of 1024/23*1024.
>     - Remove unnecessary comments.
>     - Use one style for max numbers.
>     - More comments for mlxbf_tmfifo_vdev and mlxbf_tmfifo_data_64bit.
>     - Use globally defined MTU instead of new definition.
>     - Remove forward declaration of mlxbf_tmfifo_remove().
>     - Remove PAGE_ALIGN() for dma_alloc_coherent)().
>     - Remove the cast of "struct vring *".
>     - Check return result of test_and_set_bit().
>     - Add a macro mlxbt_vdev_to_tmfifo().
>     - Several other minor coding style comments.
>     Comment not applied:
>     - "Shouldn't be rather helper in EFI lib in kernel"
>       Looks like efi.get_variable() is the way I found in the kernel
>       tree.
>     - "this one is not protected anyhow? Potential race condition"
>       In mlxbf_tmfifo_console_tx(), the spin-lock is used to protect the
>       'tx_buf' only, not the FIFO writes. So there is no race condition.
>     - "Is __packed needed in mlxbf_tmfifo_msg_hdr".
>       Yes, it is needed to make sure the structure is 8 bytes.
>     Fixes for comments from Vadim:
>     - Use tab in mlxbf-tmfifo-regs.h
>     - Use kernel-doc comments for struct mlxbf_tmfifo_msg_hdr and
>       mlxbf_tmfifo_irq_info as well.
>     - Use _MAX instead of _CNT in the macro definition to be consistent.
>     - Fix the MODULE_LICENSE.
>     - Use BIT_ULL() instead of BIT().
>     - Remove argument of 'avail' for mlxbf_tmfifo_rxtx_header() and
>       mlxbf_tmfifo_rxtx_word()
>     - Revise logic in mlxbf_tmfifo_rxtx_one_desc() to remove the
>       WARN_ON().
>     - Change "union mlxbf_tmfifo_u64 u" to "union mlxbf_tmfifo_u64 buf"
>       in mlxbf_tmfifo_rxtx_word().
>     - Change date type of vring_change from 'int' to 'bool'.
>     - Remove the blank lines after Signed-off.
>     - Don’t use declaration in the middle.
>     - Make the network header initialization in some more elegant way.
>     - Change label done to mlxbf_tmfifo_desc_done.
>     - Remove some unnecessary comments, and several other misc coding
>       style comments.
>     - Simplify code logic in mlxbf_tmfifo_virtio_notify()
>     New changes by Liming:
>     - Simplify the Rx/Tx function arguments to make it more readable.
> v8->v9:
>     Fixes for comments from Andy:
>     - Use modern devm_xxx() API instead.
>     Fixes for comments from Vadim:
>     - Split the Rx/Tx function into smaller funcitons.
>     - File name, copyright information.
>     - Function and variable name conversion.
>     - Local variable and indent coding styles.
>     - Remove unnecessary 'inline' declarations.
>     - Use devm_xxx() APIs.
>     - Move the efi_char16_t MAC address definition to global.
>     - Fix warnings reported by 'checkpatch --strict'.
>     - Fix warnings reported by 'make CF="-D__CHECK_ENDIAN__"'.
>     - Change select VIRTIO_xxx to depends on  VIRTIO_ in Kconfig.
>     - Merge mlxbf_tmfifo_vdev_tx_buf_push() and
>       mlxbf_tmfifo_vdev_tx_buf_pop().
>     - Add union to avoid casting between __le64 and u64.
>     - Several other misc coding style comments.
>     New changes by Liming:
>     - Removed the DT binding documentation since only ACPI is
>       supported for now by UEFI on the SoC.
> v8: Re-submit under drivers/platform/mellanox for the target-side
>     platform driver only.
> v7: Added host side drivers into the same patch set.
> v5~v6: Coding style fix.
> v1~v4: Initial version for directory drivers/soc/mellanox.
> ---
>  drivers/platform/mellanox/Kconfig             |   12 +-
>  drivers/platform/mellanox/Makefile            |    1 +
>  drivers/platform/mellanox/mlxbf-tmfifo-regs.h |   63 ++
>  drivers/platform/mellanox/mlxbf-tmfifo.c      | 1281 +++++++++++++++++++++++++
>  4 files changed, 1356 insertions(+), 1 deletion(-)
>  create mode 100644 drivers/platform/mellanox/mlxbf-tmfifo-regs.h
>  create mode 100644 drivers/platform/mellanox/mlxbf-tmfifo.c
>
> diff --git a/drivers/platform/mellanox/Kconfig b/drivers/platform/mellanox/Kconfig
> index cd8a908..530fe7e 100644
> --- a/drivers/platform/mellanox/Kconfig
> +++ b/drivers/platform/mellanox/Kconfig
> @@ -5,7 +5,7 @@
>
>  menuconfig MELLANOX_PLATFORM
>         bool "Platform support for Mellanox hardware"
> -       depends on X86 || ARM || COMPILE_TEST
> +       depends on X86 || ARM || ARM64 || COMPILE_TEST
>         ---help---
>           Say Y here to get to see options for platform support for
>           Mellanox systems. This option alone does not add any kernel code.
> @@ -34,4 +34,14 @@ config MLXREG_IO
>           to system resets operation, system reset causes monitoring and some
>           kinds of mux selection.
>
> +config MLXBF_TMFIFO
> +       tristate "Mellanox BlueField SoC TmFifo platform driver"
> +       depends on ARM64
> +       depends on ACPI
> +       depends on VIRTIO_CONSOLE && VIRTIO_NET
> +       help
> +         Say y here to enable TmFifo support. The TmFifo driver provides
> +          platform driver support for the TmFifo which supports console
> +          and networking based on the virtio framework.
> +
>  endif # MELLANOX_PLATFORM
> diff --git a/drivers/platform/mellanox/Makefile b/drivers/platform/mellanox/Makefile
> index 57074d9c..a229bda1 100644
> --- a/drivers/platform/mellanox/Makefile
> +++ b/drivers/platform/mellanox/Makefile
> @@ -3,5 +3,6 @@
>  # Makefile for linux/drivers/platform/mellanox
>  # Mellanox Platform-Specific Drivers
>  #
> +obj-$(CONFIG_MLXBF_TMFIFO)     += mlxbf-tmfifo.o
>  obj-$(CONFIG_MLXREG_HOTPLUG)   += mlxreg-hotplug.o
>  obj-$(CONFIG_MLXREG_IO) += mlxreg-io.o
> diff --git a/drivers/platform/mellanox/mlxbf-tmfifo-regs.h b/drivers/platform/mellanox/mlxbf-tmfifo-regs.h
> new file mode 100644
> index 0000000..e4f0d2e
> --- /dev/null
> +++ b/drivers/platform/mellanox/mlxbf-tmfifo-regs.h
> @@ -0,0 +1,63 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +/*
> + * Copyright (c) 2019, Mellanox Technologies. All rights reserved.
> + */
> +
> +#ifndef __MLXBF_TMFIFO_REGS_H__
> +#define __MLXBF_TMFIFO_REGS_H__
> +
> +#include <linux/types.h>
> +#include <linux/bits.h>
> +
> +#define MLXBF_TMFIFO_TX_DATA                           0x00
> +#define MLXBF_TMFIFO_TX_STS                            0x08
> +#define MLXBF_TMFIFO_TX_STS__LENGTH                    0x0001
> +#define MLXBF_TMFIFO_TX_STS__COUNT_SHIFT               0
> +#define MLXBF_TMFIFO_TX_STS__COUNT_WIDTH               9
> +#define MLXBF_TMFIFO_TX_STS__COUNT_RESET_VAL           0
> +#define MLXBF_TMFIFO_TX_STS__COUNT_RMASK               GENMASK_ULL(8, 0)
> +#define MLXBF_TMFIFO_TX_STS__COUNT_MASK                        GENMASK_ULL(8, 0)
> +#define MLXBF_TMFIFO_TX_CTL                            0x10
> +#define MLXBF_TMFIFO_TX_CTL__LENGTH                    0x0001
> +#define MLXBF_TMFIFO_TX_CTL__LWM_SHIFT                 0
> +#define MLXBF_TMFIFO_TX_CTL__LWM_WIDTH                 8
> +#define MLXBF_TMFIFO_TX_CTL__LWM_RESET_VAL             128
> +#define MLXBF_TMFIFO_TX_CTL__LWM_RMASK                 GENMASK_ULL(7, 0)
> +#define MLXBF_TMFIFO_TX_CTL__LWM_MASK                  GENMASK_ULL(7, 0)
> +#define MLXBF_TMFIFO_TX_CTL__HWM_SHIFT                 8
> +#define MLXBF_TMFIFO_TX_CTL__HWM_WIDTH                 8
> +#define MLXBF_TMFIFO_TX_CTL__HWM_RESET_VAL             128
> +#define MLXBF_TMFIFO_TX_CTL__HWM_RMASK                 GENMASK_ULL(7, 0)
> +#define MLXBF_TMFIFO_TX_CTL__HWM_MASK                  GENMASK_ULL(15, 8)
> +#define MLXBF_TMFIFO_TX_CTL__MAX_ENTRIES_SHIFT         32
> +#define MLXBF_TMFIFO_TX_CTL__MAX_ENTRIES_WIDTH         9
> +#define MLXBF_TMFIFO_TX_CTL__MAX_ENTRIES_RESET_VAL     256
> +#define MLXBF_TMFIFO_TX_CTL__MAX_ENTRIES_RMASK         GENMASK_ULL(8, 0)
> +#define MLXBF_TMFIFO_TX_CTL__MAX_ENTRIES_MASK          GENMASK_ULL(40, 32)
> +#define MLXBF_TMFIFO_RX_DATA                           0x00
> +#define MLXBF_TMFIFO_RX_STS                            0x08
> +#define MLXBF_TMFIFO_RX_STS__LENGTH                    0x0001
> +#define MLXBF_TMFIFO_RX_STS__COUNT_SHIFT               0
> +#define MLXBF_TMFIFO_RX_STS__COUNT_WIDTH               9
> +#define MLXBF_TMFIFO_RX_STS__COUNT_RESET_VAL           0
> +#define MLXBF_TMFIFO_RX_STS__COUNT_RMASK               GENMASK_ULL(8, 0)
> +#define MLXBF_TMFIFO_RX_STS__COUNT_MASK                        GENMASK_ULL(8, 0)
> +#define MLXBF_TMFIFO_RX_CTL                            0x10
> +#define MLXBF_TMFIFO_RX_CTL__LENGTH                    0x0001
> +#define MLXBF_TMFIFO_RX_CTL__LWM_SHIFT                 0
> +#define MLXBF_TMFIFO_RX_CTL__LWM_WIDTH                 8
> +#define MLXBF_TMFIFO_RX_CTL__LWM_RESET_VAL             128
> +#define MLXBF_TMFIFO_RX_CTL__LWM_RMASK                 GENMASK_ULL(7, 0)
> +#define MLXBF_TMFIFO_RX_CTL__LWM_MASK                  GENMASK_ULL(7, 0)
> +#define MLXBF_TMFIFO_RX_CTL__HWM_SHIFT                 8
> +#define MLXBF_TMFIFO_RX_CTL__HWM_WIDTH                 8
> +#define MLXBF_TMFIFO_RX_CTL__HWM_RESET_VAL             128
> +#define MLXBF_TMFIFO_RX_CTL__HWM_RMASK                 GENMASK_ULL(7, 0)
> +#define MLXBF_TMFIFO_RX_CTL__HWM_MASK                  GENMASK_ULL(15, 8)
> +#define MLXBF_TMFIFO_RX_CTL__MAX_ENTRIES_SHIFT         32
> +#define MLXBF_TMFIFO_RX_CTL__MAX_ENTRIES_WIDTH         9
> +#define MLXBF_TMFIFO_RX_CTL__MAX_ENTRIES_RESET_VAL     256
> +#define MLXBF_TMFIFO_RX_CTL__MAX_ENTRIES_RMASK         GENMASK_ULL(8, 0)
> +#define MLXBF_TMFIFO_RX_CTL__MAX_ENTRIES_MASK          GENMASK_ULL(40, 32)
> +
> +#endif /* !defined(__MLXBF_TMFIFO_REGS_H__) */
> diff --git a/drivers/platform/mellanox/mlxbf-tmfifo.c b/drivers/platform/mellanox/mlxbf-tmfifo.c
> new file mode 100644
> index 0000000..9a5c9fd
> --- /dev/null
> +++ b/drivers/platform/mellanox/mlxbf-tmfifo.c
> @@ -0,0 +1,1281 @@
> +// SPDX-License-Identifier: GPL-2.0+
> +/*
> + * Mellanox BlueField SoC TmFifo driver
> + *
> + * Copyright (C) 2019 Mellanox Technologies
> + */
> +
> +#include <linux/acpi.h>
> +#include <linux/bitfield.h>
> +#include <linux/circ_buf.h>
> +#include <linux/efi.h>
> +#include <linux/irq.h>
> +#include <linux/module.h>
> +#include <linux/mutex.h>
> +#include <linux/platform_device.h>
> +#include <linux/types.h>
> +
> +#include <linux/virtio_config.h>
> +#include <linux/virtio_console.h>
> +#include <linux/virtio_ids.h>
> +#include <linux/virtio_net.h>
> +#include <linux/virtio_ring.h>
> +
> +#include "mlxbf-tmfifo-regs.h"
> +
> +/* Vring size. */
> +#define MLXBF_TMFIFO_VRING_SIZE                        SZ_1K
> +
> +/* Console Tx buffer size. */
> +#define MLXBF_TMFIFO_CON_TX_BUF_SIZE           SZ_32K
> +
> +/* Console Tx buffer reserved space. */
> +#define MLXBF_TMFIFO_CON_TX_BUF_RSV_SIZE       8
> +
> +/* House-keeping timer interval. */
> +#define MLXBF_TMFIFO_TIMER_INTERVAL            (HZ / 10)
> +
> +/* Virtual devices sharing the TM FIFO. */
> +#define MLXBF_TMFIFO_VDEV_MAX          (VIRTIO_ID_CONSOLE + 1)
> +
> +/*
> + * Reserve 1/16 of TmFifo space, so console messages are not starved by
> + * the networking traffic.
> + */
> +#define MLXBF_TMFIFO_RESERVE_RATIO             16
> +
> +/* Message with data needs at least two words (for header & data). */
> +#define MLXBF_TMFIFO_DATA_MIN_WORDS            2
> +
> +struct mlxbf_tmfifo;
> +
> +/**
> + * mlxbf_tmfifo_vring - Structure of the TmFifo virtual ring
> + * @va: virtual address of the ring
> + * @dma: dma address of the ring
> + * @vq: pointer to the virtio virtqueue
> + * @desc: current descriptor of the pending packet
> + * @desc_head: head descriptor of the pending packet
> + * @cur_len: processed length of the current descriptor
> + * @rem_len: remaining length of the pending packet
> + * @pkt_len: total length of the pending packet
> + * @next_avail: next avail descriptor id
> + * @num: vring size (number of descriptors)
> + * @align: vring alignment size
> + * @index: vring index
> + * @vdev_id: vring virtio id (VIRTIO_ID_xxx)
> + * @fifo: pointer to the tmfifo structure
> + */
> +struct mlxbf_tmfifo_vring {
> +       void *va;
> +       dma_addr_t dma;
> +       struct virtqueue *vq;
> +       struct vring_desc *desc;
> +       struct vring_desc *desc_head;
> +       int cur_len;
> +       int rem_len;
> +       u32 pkt_len;
> +       u16 next_avail;
> +       int num;
> +       int align;
> +       int index;
> +       int vdev_id;
> +       struct mlxbf_tmfifo *fifo;
> +};
> +
> +/* Interrupt types. */
> +enum {
> +       MLXBF_TM_RX_LWM_IRQ,
> +       MLXBF_TM_RX_HWM_IRQ,
> +       MLXBF_TM_TX_LWM_IRQ,
> +       MLXBF_TM_TX_HWM_IRQ,
> +       MLXBF_TM_MAX_IRQ
> +};
> +
> +/* Ring types (Rx & Tx). */
> +enum {
> +       MLXBF_TMFIFO_VRING_RX,
> +       MLXBF_TMFIFO_VRING_TX,
> +       MLXBF_TMFIFO_VRING_MAX
> +};
> +
> +/**
> + * mlxbf_tmfifo_vdev - Structure of the TmFifo virtual device
> + * @vdev: virtio device, in which the vdev.id.device field has the
> + *        VIRTIO_ID_xxx id to distinguish the virtual device.
> + * @status: status of the device
> + * @features: supported features of the device
> + * @vrings: array of tmfifo vrings of this device
> + * @config.cons: virtual console config -
> + *               select if vdev.id.device is VIRTIO_ID_CONSOLE
> + * @config.net: virtual network config -
> + *              select if vdev.id.device is VIRTIO_ID_NET
> + * @tx_buf: tx buffer used to buffer data before writing into the FIFO
> + */
> +struct mlxbf_tmfifo_vdev {
> +       struct virtio_device vdev;
> +       u8 status;
> +       u64 features;
> +       struct mlxbf_tmfifo_vring vrings[MLXBF_TMFIFO_VRING_MAX];
> +       union {
> +               struct virtio_console_config cons;
> +               struct virtio_net_config net;
> +       } config;
> +       struct circ_buf tx_buf;
> +};
> +
> +/**
> + * mlxbf_tmfifo_irq_info - Structure of the interrupt information
> + * @fifo: pointer to the tmfifo structure
> + * @irq: interrupt number
> + * @index: index into the interrupt array
> + */
> +struct mlxbf_tmfifo_irq_info {
> +       struct mlxbf_tmfifo *fifo;
> +       int irq;
> +       int index;
> +};
> +
> +/**
> + * mlxbf_tmfifo - Structure of the TmFifo
> + * @vdev: array of the virtual devices running over the TmFifo
> + * @lock: lock to protect the TmFifo access
> + * @rx_base: mapped register base address for the Rx FIFO
> + * @tx_base: mapped register base address for the Tx FIFO
> + * @rx_fifo_size: number of entries of the Rx FIFO
> + * @tx_fifo_size: number of entries of the Tx FIFO
> + * @pend_events: pending bits for deferred events
> + * @irq_info: interrupt information
> + * @work: work struct for deferred process
> + * @timer: background timer
> + * @vring: Tx/Rx ring
> + * @spin_lock: spin lock
> + * @is_ready: ready flag
> + */
> +struct mlxbf_tmfifo {
> +       struct mlxbf_tmfifo_vdev *vdev[MLXBF_TMFIFO_VDEV_MAX];
> +       struct mutex lock;              /* TmFifo lock */
> +       void __iomem *rx_base;
> +       void __iomem *tx_base;
> +       int rx_fifo_size;
> +       int tx_fifo_size;
> +       unsigned long pend_events;
> +       struct mlxbf_tmfifo_irq_info irq_info[MLXBF_TM_MAX_IRQ];
> +       struct work_struct work;
> +       struct timer_list timer;
> +       struct mlxbf_tmfifo_vring *vring[2];
> +       spinlock_t spin_lock;           /* spin lock */
> +       bool is_ready;
> +};
> +
> +/**
> + * mlxbf_tmfifo_msg_hdr - Structure of the TmFifo message header
> + * @type: message type
> + * @len: payload length in network byte order. Messages sent into the FIFO
> + *       will be read by the other side as data stream in the same byte order.
> + *       The length needs to be encoded into network order so both sides
> + *       could understand it.
> + */
> +struct mlxbf_tmfifo_msg_hdr {
> +       u8 type;
> +       __be16 len;
> +       u8 unused[5];
> +} __packed __aligned(sizeof(u64));
> +
> +/*
> + * Default MAC.
> + * This MAC address will be read from EFI persistent variable if configured.
> + * It can also be reconfigured with standard Linux tools.
> + */
> +static u8 mlxbf_tmfifo_net_default_mac[ETH_ALEN] = {
> +       0x00, 0x1A, 0xCA, 0xFF, 0xFF, 0x01
> +};
> +
> +/* EFI variable name of the MAC address. */
> +static efi_char16_t mlxbf_tmfifo_efi_name[] = L"RshimMacAddr";
> +
> +/* Maximum L2 header length. */
> +#define MLXBF_TMFIFO_NET_L2_OVERHEAD   36
> +
> +/* Supported virtio-net features. */
> +#define MLXBF_TMFIFO_NET_FEATURES \
> +       (BIT_ULL(VIRTIO_NET_F_MTU) | BIT_ULL(VIRTIO_NET_F_STATUS) | \
> +        BIT_ULL(VIRTIO_NET_F_MAC))
> +
> +#define mlxbf_vdev_to_tmfifo(d) container_of(d, struct mlxbf_tmfifo_vdev, vdev)
> +
> +/* Free vrings of the FIFO device. */
> +static void mlxbf_tmfifo_free_vrings(struct mlxbf_tmfifo *fifo,
> +                                    struct mlxbf_tmfifo_vdev *tm_vdev)
> +{
> +       struct mlxbf_tmfifo_vring *vring;
> +       int i, size;
> +
> +       for (i = 0; i < ARRAY_SIZE(tm_vdev->vrings); i++) {
> +               vring = &tm_vdev->vrings[i];
> +               if (vring->va) {
> +                       size = vring_size(vring->num, vring->align);
> +                       dma_free_coherent(tm_vdev->vdev.dev.parent, size,
> +                                         vring->va, vring->dma);
> +                       vring->va = NULL;
> +                       if (vring->vq) {
> +                               vring_del_virtqueue(vring->vq);
> +                               vring->vq = NULL;
> +                       }
> +               }
> +       }
> +}
> +
> +/* Allocate vrings for the FIFO. */
> +static int mlxbf_tmfifo_alloc_vrings(struct mlxbf_tmfifo *fifo,
> +                                    struct mlxbf_tmfifo_vdev *tm_vdev)
> +{
> +       struct mlxbf_tmfifo_vring *vring;
> +       struct device *dev;
> +       dma_addr_t dma;
> +       int i, size;
> +       void *va;
> +
> +       for (i = 0; i < ARRAY_SIZE(tm_vdev->vrings); i++) {
> +               vring = &tm_vdev->vrings[i];
> +               vring->fifo = fifo;
> +               vring->num = MLXBF_TMFIFO_VRING_SIZE;
> +               vring->align = SMP_CACHE_BYTES;
> +               vring->index = i;
> +               vring->vdev_id = tm_vdev->vdev.id.device;
> +               dev = &tm_vdev->vdev.dev;
> +
> +               size = vring_size(vring->num, vring->align);
> +               va = dma_alloc_coherent(dev->parent, size, &dma, GFP_KERNEL);
> +               if (!va) {
> +                       mlxbf_tmfifo_free_vrings(fifo, tm_vdev);
> +                       dev_err(dev->parent, "dma_alloc_coherent failed\n");
> +                       return -ENOMEM;
> +               }
> +
> +               vring->va = va;
> +               vring->dma = dma;
> +       }
> +
> +       return 0;
> +}
> +
> +/* Disable interrupts of the FIFO device. */
> +static void mlxbf_tmfifo_disable_irqs(struct mlxbf_tmfifo *fifo)
> +{
> +       int i, irq;
> +
> +       for (i = 0; i < MLXBF_TM_MAX_IRQ; i++) {
> +               irq = fifo->irq_info[i].irq;
> +               fifo->irq_info[i].irq = 0;
> +               disable_irq(irq);
> +       }
> +}
> +
> +/* Interrupt handler. */
> +static irqreturn_t mlxbf_tmfifo_irq_handler(int irq, void *arg)
> +{
> +       struct mlxbf_tmfifo_irq_info *irq_info = arg;
> +
> +       if (!test_and_set_bit(irq_info->index, &irq_info->fifo->pend_events))
> +               schedule_work(&irq_info->fifo->work);
> +
> +       return IRQ_HANDLED;
> +}
> +
> +/* Get the next packet descriptor from the vring. */
> +static struct vring_desc *
> +mlxbf_tmfifo_get_next_desc(struct mlxbf_tmfifo_vring *vring)
> +{
> +       const struct vring *vr = virtqueue_get_vring(vring->vq);
> +       struct virtio_device *vdev = vring->vq->vdev;
> +       unsigned int idx, head;
> +
> +       if (vring->next_avail == virtio16_to_cpu(vdev, vr->avail->idx))
> +               return NULL;
> +
> +       idx = vring->next_avail % vr->num;
> +       head = virtio16_to_cpu(vdev, vr->avail->ring[idx]);
> +       if (WARN_ON(head >= vr->num))
> +               return NULL;
> +
> +       vring->next_avail++;
> +
> +       return &vr->desc[head];
> +}
> +
> +/* Release virtio descriptor. */
> +static void mlxbf_tmfifo_release_desc(struct mlxbf_tmfifo_vring *vring,
> +                                     struct vring_desc *desc, u32 len)
> +{
> +       const struct vring *vr = virtqueue_get_vring(vring->vq);
> +       struct virtio_device *vdev = vring->vq->vdev;
> +       u16 idx, vr_idx;
> +
> +       vr_idx = virtio16_to_cpu(vdev, vr->used->idx);
> +       idx = vr_idx % vr->num;
> +       vr->used->ring[idx].id = cpu_to_virtio32(vdev, desc - vr->desc);
> +       vr->used->ring[idx].len = cpu_to_virtio32(vdev, len);
> +
> +       /*
> +        * Virtio could poll and check the 'idx' to decide whether the desc is
> +        * done or not. Add a memory barrier here to make sure the update above
> +        * completes before updating the idx.
> +        */
> +       mb();
> +       vr->used->idx = cpu_to_virtio16(vdev, vr_idx + 1);
> +}
> +
> +/* Get the total length of the descriptor chain. */
> +static u32 mlxbf_tmfifo_get_pkt_len(struct mlxbf_tmfifo_vring *vring,
> +                                   struct vring_desc *desc)
> +{
> +       const struct vring *vr = virtqueue_get_vring(vring->vq);
> +       struct virtio_device *vdev = vring->vq->vdev;
> +       u32 len = 0, idx;
> +
> +       while (desc) {
> +               len += virtio32_to_cpu(vdev, desc->len);
> +               if (!(virtio16_to_cpu(vdev, desc->flags) & VRING_DESC_F_NEXT))
> +                       break;
> +               idx = virtio16_to_cpu(vdev, desc->next);
> +               desc = &vr->desc[idx];
> +       }
> +
> +       return len;
> +}
> +
> +static void mlxbf_tmfifo_release_pending_pkt(struct mlxbf_tmfifo_vring *vring)
> +{
> +       struct vring_desc *desc_head;
> +       u32 len = 0;
> +
> +       if (vring->desc_head) {
> +               desc_head = vring->desc_head;
> +               len = vring->pkt_len;
> +       } else {
> +               desc_head = mlxbf_tmfifo_get_next_desc(vring);
> +               len = mlxbf_tmfifo_get_pkt_len(vring, desc_head);
> +       }
> +
> +       if (desc_head)
> +               mlxbf_tmfifo_release_desc(vring, desc_head, len);
> +
> +       vring->pkt_len = 0;
> +       vring->desc = NULL;
> +       vring->desc_head = NULL;
> +}
> +
> +static void mlxbf_tmfifo_init_net_desc(struct mlxbf_tmfifo_vring *vring,
> +                                      struct vring_desc *desc, bool is_rx)
> +{
> +       struct virtio_device *vdev = vring->vq->vdev;
> +       struct virtio_net_hdr *net_hdr;
> +
> +       net_hdr = phys_to_virt(virtio64_to_cpu(vdev, desc->addr));
> +       memset(net_hdr, 0, sizeof(*net_hdr));
> +}
> +
> +/* Get and initialize the next packet. */
> +static struct vring_desc *
> +mlxbf_tmfifo_get_next_pkt(struct mlxbf_tmfifo_vring *vring, bool is_rx)
> +{
> +       struct vring_desc *desc;
> +
> +       desc = mlxbf_tmfifo_get_next_desc(vring);
> +       if (desc && is_rx && vring->vdev_id == VIRTIO_ID_NET)
> +               mlxbf_tmfifo_init_net_desc(vring, desc, is_rx);
> +
> +       vring->desc_head = desc;
> +       vring->desc = desc;
> +
> +       return desc;
> +}
> +
> +/* House-keeping timer. */
> +static void mlxbf_tmfifo_timer(struct timer_list *t)
> +{
> +       struct mlxbf_tmfifo *fifo = container_of(t, struct mlxbf_tmfifo, timer);
> +       int rx, tx;
> +
> +       rx = !test_and_set_bit(MLXBF_TM_RX_HWM_IRQ, &fifo->pend_events);
> +       tx = !test_and_set_bit(MLXBF_TM_TX_LWM_IRQ, &fifo->pend_events);
> +
> +       if (rx || tx)
> +               schedule_work(&fifo->work);
> +
> +       mod_timer(&fifo->timer, jiffies + MLXBF_TMFIFO_TIMER_INTERVAL);
> +}
> +
> +/* Copy one console packet into the output buffer. */
> +static void mlxbf_tmfifo_console_output_one(struct mlxbf_tmfifo_vdev *cons,
> +                                           struct mlxbf_tmfifo_vring *vring,
> +                                           struct vring_desc *desc)
> +{
> +       const struct vring *vr = virtqueue_get_vring(vring->vq);
> +       struct virtio_device *vdev = &cons->vdev;
> +       u32 len, idx, seg;
> +       void *addr;
> +
> +       while (desc) {
> +               addr = phys_to_virt(virtio64_to_cpu(vdev, desc->addr));
> +               len = virtio32_to_cpu(vdev, desc->len);
> +
> +               seg = CIRC_SPACE_TO_END(cons->tx_buf.head, cons->tx_buf.tail,
> +                                       MLXBF_TMFIFO_CON_TX_BUF_SIZE);
> +               if (len <= seg) {
> +                       memcpy(cons->tx_buf.buf + cons->tx_buf.head, addr, len);
> +               } else {
> +                       memcpy(cons->tx_buf.buf + cons->tx_buf.head, addr, seg);
> +                       addr += seg;
> +                       memcpy(cons->tx_buf.buf, addr, len - seg);
> +               }
> +               cons->tx_buf.head = (cons->tx_buf.head + len) %
> +                       MLXBF_TMFIFO_CON_TX_BUF_SIZE;
> +
> +               if (!(virtio16_to_cpu(vdev, desc->flags) & VRING_DESC_F_NEXT))
> +                       break;
> +               idx = virtio16_to_cpu(vdev, desc->next);
> +               desc = &vr->desc[idx];
> +       }
> +}
> +
> +/* Copy console data into the output buffer. */
> +static void mlxbf_tmfifo_console_output(struct mlxbf_tmfifo_vdev *cons,
> +                                       struct mlxbf_tmfifo_vring *vring)
> +{
> +       struct vring_desc *desc;
> +       u32 len, avail;
> +
> +       desc = mlxbf_tmfifo_get_next_desc(vring);
> +       while (desc) {
> +               /* Release the packet if not enough space. */
> +               len = mlxbf_tmfifo_get_pkt_len(vring, desc);
> +               avail = CIRC_SPACE(cons->tx_buf.head, cons->tx_buf.tail,
> +                                  MLXBF_TMFIFO_CON_TX_BUF_SIZE);
> +               if (len + MLXBF_TMFIFO_CON_TX_BUF_RSV_SIZE > avail) {
> +                       mlxbf_tmfifo_release_desc(vring, desc, len);
> +                       break;
> +               }
> +
> +               mlxbf_tmfifo_console_output_one(cons, vring, desc);
> +               mlxbf_tmfifo_release_desc(vring, desc, len);
> +               desc = mlxbf_tmfifo_get_next_desc(vring);
> +       }
> +}
> +
> +/* Get the number of available words in Rx FIFO for receiving. */
> +static int mlxbf_tmfifo_get_rx_avail(struct mlxbf_tmfifo *fifo)
> +{
> +       u64 sts;
> +
> +       sts = readq(fifo->rx_base + MLXBF_TMFIFO_RX_STS);
> +       return FIELD_GET(MLXBF_TMFIFO_RX_STS__COUNT_MASK, sts);
> +}
> +
> +/* Get the number of available words in the TmFifo for sending. */
> +static int mlxbf_tmfifo_get_tx_avail(struct mlxbf_tmfifo *fifo, int vdev_id)
> +{
> +       int tx_reserve;
> +       u32 count;
> +       u64 sts;
> +
> +       /* Reserve some room in FIFO for console messages. */
> +       if (vdev_id == VIRTIO_ID_NET)
> +               tx_reserve = fifo->tx_fifo_size / MLXBF_TMFIFO_RESERVE_RATIO;
> +       else
> +               tx_reserve = 1;
> +
> +       sts = readq(fifo->tx_base + MLXBF_TMFIFO_TX_STS);
> +       count = FIELD_GET(MLXBF_TMFIFO_TX_STS__COUNT_MASK, sts);
> +       return fifo->tx_fifo_size - tx_reserve - count;
> +}
> +
> +/* Console Tx (move data from the output buffer into the TmFifo). */
> +static void mlxbf_tmfifo_console_tx(struct mlxbf_tmfifo *fifo, int avail)
> +{
> +       struct mlxbf_tmfifo_msg_hdr hdr;
> +       struct mlxbf_tmfifo_vdev *cons;
> +       unsigned long flags;
> +       int size, seg;
> +       void *addr;
> +       u64 data;
> +
> +       /* Return if not enough space available. */
> +       if (avail < MLXBF_TMFIFO_DATA_MIN_WORDS)
> +               return;
> +
> +       cons = fifo->vdev[VIRTIO_ID_CONSOLE];
> +       if (!cons || !cons->tx_buf.buf)
> +               return;
> +
> +       /* Return if no data to send. */
> +       size = CIRC_CNT(cons->tx_buf.head, cons->tx_buf.tail,
> +                       MLXBF_TMFIFO_CON_TX_BUF_SIZE);
> +       if (size == 0)
> +               return;
> +
> +       /* Adjust the size to available space. */
> +       if (size + sizeof(hdr) > avail * sizeof(u64))
> +               size = avail * sizeof(u64) - sizeof(hdr);
> +
> +       /* Write header. */
> +       hdr.type = VIRTIO_ID_CONSOLE;
> +       hdr.len = htons(size);
> +       writeq(*(u64 *)&hdr, fifo->tx_base + MLXBF_TMFIFO_TX_DATA);
> +
> +       /* Use spin-lock to protect the 'cons->tx_buf'. */
> +       spin_lock_irqsave(&fifo->spin_lock, flags);
> +
> +       while (size > 0) {
> +               addr = cons->tx_buf.buf + cons->tx_buf.tail;
> +
> +               seg = CIRC_CNT_TO_END(cons->tx_buf.head, cons->tx_buf.tail,
> +                                     MLXBF_TMFIFO_CON_TX_BUF_SIZE);
> +               if (seg >= sizeof(u64)) {
> +                       memcpy(&data, addr, sizeof(u64));
> +               } else {
> +                       memcpy(&data, addr, seg);
> +                       memcpy((u8 *)&data + seg, cons->tx_buf.buf,
> +                              sizeof(u64) - seg);
> +               }
> +               writeq(data, fifo->tx_base + MLXBF_TMFIFO_TX_DATA);
> +
> +               if (size >= sizeof(u64)) {
> +                       cons->tx_buf.tail = (cons->tx_buf.tail + sizeof(u64)) %
> +                               MLXBF_TMFIFO_CON_TX_BUF_SIZE;
> +                       size -= sizeof(u64);
> +               } else {
> +                       cons->tx_buf.tail = (cons->tx_buf.tail + size) %
> +                               MLXBF_TMFIFO_CON_TX_BUF_SIZE;
> +                       size = 0;
> +               }
> +       }
> +
> +       spin_unlock_irqrestore(&fifo->spin_lock, flags);
> +}
> +
> +/* Rx/Tx one word in the descriptor buffer. */
> +static void mlxbf_tmfifo_rxtx_word(struct mlxbf_tmfifo_vring *vring,
> +                                  struct vring_desc *desc,
> +                                  bool is_rx, int len)
> +{
> +       struct virtio_device *vdev = vring->vq->vdev;
> +       struct mlxbf_tmfifo *fifo = vring->fifo;
> +       void *addr;
> +       u64 data;
> +
> +       /* Get the buffer address of this desc. */
> +       addr = phys_to_virt(virtio64_to_cpu(vdev, desc->addr));
> +
> +       /* Read a word from FIFO for Rx. */
> +       if (is_rx)
> +               data = readq(fifo->rx_base + MLXBF_TMFIFO_RX_DATA);
> +
> +       if (vring->cur_len + sizeof(u64) <= len) {
> +               /* The whole word. */
> +               if (is_rx)
> +                       memcpy(addr + vring->cur_len, &data, sizeof(u64));
> +               else
> +                       memcpy(&data, addr + vring->cur_len, sizeof(u64));
> +               vring->cur_len += sizeof(u64);
> +       } else {
> +               /* Leftover bytes. */
> +               if (is_rx)
> +                       memcpy(addr + vring->cur_len, &data,
> +                              len - vring->cur_len);
> +               else
> +                       memcpy(&data, addr + vring->cur_len,
> +                              len - vring->cur_len);
> +               vring->cur_len = len;
> +       }
> +
> +       /* Write the word into FIFO for Tx. */
> +       if (!is_rx)
> +               writeq(data, fifo->tx_base + MLXBF_TMFIFO_TX_DATA);
> +}
> +
> +/*
> + * Rx/Tx packet header.
> + *
> + * In Rx case, the packet might be found to belong to a different vring since
> + * the TmFifo is shared by different services. In such case, the 'vring_change'
> + * flag is set.
> + */
> +static void mlxbf_tmfifo_rxtx_header(struct mlxbf_tmfifo_vring *vring,
> +                                    struct vring_desc *desc,
> +                                    bool is_rx, bool *vring_change)
> +{
> +       struct mlxbf_tmfifo *fifo = vring->fifo;
> +       struct virtio_net_config *config;
> +       struct mlxbf_tmfifo_msg_hdr hdr;
> +       int vdev_id, hdr_len;
> +
> +       /* Read/Write packet header. */
> +       if (is_rx) {
> +               /* Drain one word from the FIFO. */
> +               *(u64 *)&hdr = readq(fifo->rx_base + MLXBF_TMFIFO_RX_DATA);
> +
> +               /* Skip the length 0 packets (keepalive). */
> +               if (hdr.len == 0)
> +                       return;
> +
> +               /* Check packet type. */
> +               if (hdr.type == VIRTIO_ID_NET) {
> +                       vdev_id = VIRTIO_ID_NET;
> +                       hdr_len = sizeof(struct virtio_net_hdr);
> +                       config = &fifo->vdev[vdev_id]->config.net;
> +                       if (ntohs(hdr.len) > config->mtu +
> +                           MLXBF_TMFIFO_NET_L2_OVERHEAD)
> +                               return;
> +               } else {
> +                       vdev_id = VIRTIO_ID_CONSOLE;
> +                       hdr_len = 0;
> +               }
> +
> +               /*
> +                * Check whether the new packet still belongs to this vring.
> +                * If not, update the pkt_len of the new vring.
> +                */
> +               if (vdev_id != vring->vdev_id) {
> +                       struct mlxbf_tmfifo_vdev *tm_dev2 = fifo->vdev[vdev_id];
> +
> +                       if (!tm_dev2)
> +                               return;
> +                       vring->desc = desc;
> +                       vring = &tm_dev2->vrings[MLXBF_TMFIFO_VRING_RX];
> +                       *vring_change = true;
> +               }
> +               vring->pkt_len = ntohs(hdr.len) + hdr_len;
> +       } else {
> +               /* Network virtio has an extra header. */
> +               hdr_len = (vring->vdev_id == VIRTIO_ID_NET) ?
> +                          sizeof(struct virtio_net_hdr) : 0;
> +               vring->pkt_len = mlxbf_tmfifo_get_pkt_len(vring, desc);
> +               hdr.type = (vring->vdev_id == VIRTIO_ID_NET) ?
> +                           VIRTIO_ID_NET : VIRTIO_ID_CONSOLE;
> +               hdr.len = htons(vring->pkt_len - hdr_len);
> +               writeq(*(u64 *)&hdr, fifo->tx_base + MLXBF_TMFIFO_TX_DATA);
> +       }
> +
> +       vring->cur_len = hdr_len;
> +       vring->rem_len = vring->pkt_len;
> +       fifo->vring[is_rx] = vring;
> +}
> +
> +/*
> + * Rx/Tx one descriptor.
> + *
> + * Return true to indicate more data available.
> + */
> +static bool mlxbf_tmfifo_rxtx_one_desc(struct mlxbf_tmfifo_vring *vring,
> +                                      bool is_rx, int *avail)
> +{
> +       const struct vring *vr = virtqueue_get_vring(vring->vq);
> +       struct mlxbf_tmfifo *fifo = vring->fifo;
> +       struct virtio_device *vdev;
> +       bool vring_change = false;
> +       struct vring_desc *desc;
> +       unsigned long flags;
> +       u32 len, idx;
> +
> +       vdev = &fifo->vdev[vring->vdev_id]->vdev;
> +
> +       /* Get the descriptor of the next packet. */
> +       if (!vring->desc) {
> +               desc = mlxbf_tmfifo_get_next_pkt(vring, is_rx);
> +               if (!desc)
> +                       return false;
> +       } else {
> +               desc = vring->desc;
> +       }
> +
> +       /* Beginning of a packet. Start to Rx/Tx packet header. */
> +       if (vring->pkt_len == 0) {
> +               mlxbf_tmfifo_rxtx_header(vring, desc, is_rx, &vring_change);
> +               (*avail)--;
> +
> +               /* Return if new packet is for another ring. */
> +               if (vring_change)
> +                       return false;
> +               goto mlxbf_tmfifo_desc_done;
> +       }
> +
> +       /* Get the length of this desc. */
> +       len = virtio32_to_cpu(vdev, desc->len);
> +       if (len > vring->rem_len)
> +               len = vring->rem_len;
> +
> +       /* Rx/Tx one word (8 bytes) if not done. */
> +       if (vring->cur_len < len) {
> +               mlxbf_tmfifo_rxtx_word(vring, desc, is_rx, len);
> +               (*avail)--;
> +       }
> +
> +       /* Check again whether it's done. */
> +       if (vring->cur_len == len) {
> +               vring->cur_len = 0;
> +               vring->rem_len -= len;
> +
> +               /* Get the next desc on the chain. */
> +               if (vring->rem_len > 0 &&
> +                   (virtio16_to_cpu(vdev, desc->flags) & VRING_DESC_F_NEXT)) {
> +                       idx = virtio16_to_cpu(vdev, desc->next);
> +                       desc = &vr->desc[idx];
> +                       goto mlxbf_tmfifo_desc_done;
> +               }
> +
> +               /* Done and release the pending packet. */
> +               mlxbf_tmfifo_release_pending_pkt(vring);
> +               desc = NULL;
> +               fifo->vring[is_rx] = NULL;
> +
> +               /* Notify upper layer that packet is done. */
> +               spin_lock_irqsave(&fifo->spin_lock, flags);
> +               vring_interrupt(0, vring->vq);
> +               spin_unlock_irqrestore(&fifo->spin_lock, flags);
> +       }
> +
> +mlxbf_tmfifo_desc_done:
> +       /* Save the current desc. */
> +       vring->desc = desc;
> +
> +       return true;
> +}
> +
> +/* Rx & Tx processing of a queue. */
> +static void mlxbf_tmfifo_rxtx(struct mlxbf_tmfifo_vring *vring, bool is_rx)
> +{
> +       int avail = 0, devid = vring->vdev_id;
> +       struct mlxbf_tmfifo *fifo;
> +       bool more;
> +
> +       fifo = vring->fifo;
> +
> +       /* Return if vdev is not ready. */
> +       if (!fifo->vdev[devid])
> +               return;
> +
> +       /* Return if another vring is running. */
> +       if (fifo->vring[is_rx] && fifo->vring[is_rx] != vring)
> +               return;
> +
> +       /* Only handle console and network for now. */
> +       if (WARN_ON(devid != VIRTIO_ID_NET && devid != VIRTIO_ID_CONSOLE))
> +               return;
> +
> +       do {
> +               /* Get available FIFO space. */
> +               if (avail == 0) {
> +                       if (is_rx)
> +                               avail = mlxbf_tmfifo_get_rx_avail(fifo);
> +                       else
> +                               avail = mlxbf_tmfifo_get_tx_avail(fifo, devid);
> +                       if (avail <= 0)
> +                               break;
> +               }
> +
> +               /* Console output always comes from the Tx buffer. */
> +               if (!is_rx && devid == VIRTIO_ID_CONSOLE) {
> +                       mlxbf_tmfifo_console_tx(fifo, avail);
> +                       break;
> +               }
> +
> +               /* Handle one descriptor. */
> +               more = mlxbf_tmfifo_rxtx_one_desc(vring, is_rx, &avail);
> +       } while (more);
> +}
> +
> +/* Handle Rx or Tx queues. */
> +static void mlxbf_tmfifo_work_rxtx(struct mlxbf_tmfifo *fifo, int queue_id,
> +                                  int irq_id, bool is_rx)
> +{
> +       struct mlxbf_tmfifo_vdev *tm_vdev;
> +       struct mlxbf_tmfifo_vring *vring;
> +       int i;
> +
> +       if (!test_and_clear_bit(irq_id, &fifo->pend_events) ||
> +           !fifo->irq_info[irq_id].irq)
> +               return;
> +
> +       for (i = 0; i < MLXBF_TMFIFO_VDEV_MAX; i++) {
> +               tm_vdev = fifo->vdev[i];
> +               if (tm_vdev) {
> +                       vring = &tm_vdev->vrings[queue_id];
> +                       if (vring->vq)
> +                               mlxbf_tmfifo_rxtx(vring, is_rx);
> +               }
> +       }
> +}
> +
> +/* Work handler for Rx and Tx case. */
> +static void mlxbf_tmfifo_work_handler(struct work_struct *work)
> +{
> +       struct mlxbf_tmfifo *fifo;
> +
> +       fifo = container_of(work, struct mlxbf_tmfifo, work);
> +       if (!fifo->is_ready)
> +               return;
> +
> +       mutex_lock(&fifo->lock);
> +
> +       /* Tx (Send data to the TmFifo). */
> +       mlxbf_tmfifo_work_rxtx(fifo, MLXBF_TMFIFO_VRING_TX,
> +                              MLXBF_TM_TX_LWM_IRQ, false);
> +
> +       /* Rx (Receive data from the TmFifo). */
> +       mlxbf_tmfifo_work_rxtx(fifo, MLXBF_TMFIFO_VRING_RX,
> +                              MLXBF_TM_RX_HWM_IRQ, true);
> +
> +       mutex_unlock(&fifo->lock);
> +}
> +
> +/* The notify function is called when new buffers are posted. */
> +static bool mlxbf_tmfifo_virtio_notify(struct virtqueue *vq)
> +{
> +       struct mlxbf_tmfifo_vring *vring = vq->priv;
> +       struct mlxbf_tmfifo_vdev *tm_vdev;
> +       struct mlxbf_tmfifo *fifo;
> +       unsigned long flags;
> +
> +       fifo = vring->fifo;
> +
> +       /*
> +        * Virtio maintains vrings in pairs, even number ring for Rx
> +        * and odd number ring for Tx.
> +        */
> +       if (vring->index & BIT(0)) {
> +               /*
> +                * Console could make blocking call with interrupts disabled.
> +                * In such case, the vring needs to be served right away. For
> +                * other cases, just set the TX LWM bit to start Tx in the
> +                * worker handler.
> +                */
> +               if (vring->vdev_id == VIRTIO_ID_CONSOLE) {
> +                       spin_lock_irqsave(&fifo->spin_lock, flags);
> +                       tm_vdev = fifo->vdev[VIRTIO_ID_CONSOLE];
> +                       mlxbf_tmfifo_console_output(tm_vdev, vring);
> +                       spin_unlock_irqrestore(&fifo->spin_lock, flags);
> +               } else if (test_and_set_bit(MLXBF_TM_TX_LWM_IRQ,
> +                                           &fifo->pend_events)) {
> +                       return true;
> +               }
> +       } else {
> +               if (test_and_set_bit(MLXBF_TM_RX_HWM_IRQ, &fifo->pend_events))
> +                       return true;
> +       }
> +
> +       schedule_work(&fifo->work);
> +
> +       return true;
> +}
> +
> +/* Get the array of feature bits for this device. */
> +static u64 mlxbf_tmfifo_virtio_get_features(struct virtio_device *vdev)
> +{
> +       struct mlxbf_tmfifo_vdev *tm_vdev = mlxbf_vdev_to_tmfifo(vdev);
> +
> +       return tm_vdev->features;
> +}
> +
> +/* Confirm device features to use. */
> +static int mlxbf_tmfifo_virtio_finalize_features(struct virtio_device *vdev)
> +{
> +       struct mlxbf_tmfifo_vdev *tm_vdev = mlxbf_vdev_to_tmfifo(vdev);
> +
> +       tm_vdev->features = vdev->features;
> +
> +       return 0;
> +}
> +
> +/* Free virtqueues found by find_vqs(). */
> +static void mlxbf_tmfifo_virtio_del_vqs(struct virtio_device *vdev)
> +{
> +       struct mlxbf_tmfifo_vdev *tm_vdev = mlxbf_vdev_to_tmfifo(vdev);
> +       struct mlxbf_tmfifo_vring *vring;
> +       struct virtqueue *vq;
> +       int i;
> +
> +       for (i = 0; i < ARRAY_SIZE(tm_vdev->vrings); i++) {
> +               vring = &tm_vdev->vrings[i];
> +
> +               /* Release the pending packet. */
> +               if (vring->desc)
> +                       mlxbf_tmfifo_release_pending_pkt(vring);
> +               vq = vring->vq;
> +               if (vq) {
> +                       vring->vq = NULL;
> +                       vring_del_virtqueue(vq);
> +               }
> +       }
> +}
> +
> +/* Create and initialize the virtual queues. */
> +static int mlxbf_tmfifo_virtio_find_vqs(struct virtio_device *vdev,
> +                                       unsigned int nvqs,
> +                                       struct virtqueue *vqs[],
> +                                       vq_callback_t *callbacks[],
> +                                       const char * const names[],
> +                                       const bool *ctx,
> +                                       struct irq_affinity *desc)
> +{
> +       struct mlxbf_tmfifo_vdev *tm_vdev = mlxbf_vdev_to_tmfifo(vdev);
> +       struct mlxbf_tmfifo_vring *vring;
> +       struct virtqueue *vq;
> +       int i, ret, size;
> +
> +       if (nvqs > ARRAY_SIZE(tm_vdev->vrings))
> +               return -EINVAL;
> +
> +       for (i = 0; i < nvqs; ++i) {
> +               if (!names[i]) {
> +                       ret = -EINVAL;
> +                       goto error;
> +               }
> +               vring = &tm_vdev->vrings[i];
> +
> +               /* zero vring */
> +               size = vring_size(vring->num, vring->align);
> +               memset(vring->va, 0, size);
> +               vq = vring_new_virtqueue(i, vring->num, vring->align, vdev,
> +                                        false, false, vring->va,
> +                                        mlxbf_tmfifo_virtio_notify,
> +                                        callbacks[i], names[i]);
> +               if (!vq) {
> +                       dev_err(&vdev->dev, "vring_new_virtqueue failed\n");
> +                       ret = -ENOMEM;
> +                       goto error;
> +               }
> +
> +               vqs[i] = vq;
> +               vring->vq = vq;
> +               vq->priv = vring;
> +       }
> +
> +       return 0;
> +
> +error:
> +       mlxbf_tmfifo_virtio_del_vqs(vdev);
> +       return ret;
> +}
> +
> +/* Read the status byte. */
> +static u8 mlxbf_tmfifo_virtio_get_status(struct virtio_device *vdev)
> +{
> +       struct mlxbf_tmfifo_vdev *tm_vdev = mlxbf_vdev_to_tmfifo(vdev);
> +
> +       return tm_vdev->status;
> +}
> +
> +/* Write the status byte. */
> +static void mlxbf_tmfifo_virtio_set_status(struct virtio_device *vdev,
> +                                          u8 status)
> +{
> +       struct mlxbf_tmfifo_vdev *tm_vdev = mlxbf_vdev_to_tmfifo(vdev);
> +
> +       tm_vdev->status = status;
> +}
> +
> +/* Reset the device. Not much here for now. */
> +static void mlxbf_tmfifo_virtio_reset(struct virtio_device *vdev)
> +{
> +       struct mlxbf_tmfifo_vdev *tm_vdev = mlxbf_vdev_to_tmfifo(vdev);
> +
> +       tm_vdev->status = 0;
> +}
> +
> +/* Read the value of a configuration field. */
> +static void mlxbf_tmfifo_virtio_get(struct virtio_device *vdev,
> +                                   unsigned int offset,
> +                                   void *buf,
> +                                   unsigned int len)
> +{
> +       struct mlxbf_tmfifo_vdev *tm_vdev = mlxbf_vdev_to_tmfifo(vdev);
> +
> +       if ((u64)offset + len > sizeof(tm_vdev->config))
> +               return;
> +
> +       memcpy(buf, (u8 *)&tm_vdev->config + offset, len);
> +}
> +
> +/* Write the value of a configuration field. */
> +static void mlxbf_tmfifo_virtio_set(struct virtio_device *vdev,
> +                                   unsigned int offset,
> +                                   const void *buf,
> +                                   unsigned int len)
> +{
> +       struct mlxbf_tmfifo_vdev *tm_vdev = mlxbf_vdev_to_tmfifo(vdev);
> +
> +       if ((u64)offset + len > sizeof(tm_vdev->config))
> +               return;
> +
> +       memcpy((u8 *)&tm_vdev->config + offset, buf, len);
> +}
> +
> +static void tmfifo_virtio_dev_release(struct device *device)
> +{
> +       struct virtio_device *vdev =
> +                       container_of(device, struct virtio_device, dev);
> +       struct mlxbf_tmfifo_vdev *tm_vdev = mlxbf_vdev_to_tmfifo(vdev);
> +
> +       kfree(tm_vdev);
> +}
> +
> +/* Virtio config operations. */
> +static const struct virtio_config_ops mlxbf_tmfifo_virtio_config_ops = {
> +       .get_features = mlxbf_tmfifo_virtio_get_features,
> +       .finalize_features = mlxbf_tmfifo_virtio_finalize_features,
> +       .find_vqs = mlxbf_tmfifo_virtio_find_vqs,
> +       .del_vqs = mlxbf_tmfifo_virtio_del_vqs,
> +       .reset = mlxbf_tmfifo_virtio_reset,
> +       .set_status = mlxbf_tmfifo_virtio_set_status,
> +       .get_status = mlxbf_tmfifo_virtio_get_status,
> +       .get = mlxbf_tmfifo_virtio_get,
> +       .set = mlxbf_tmfifo_virtio_set,
> +};
> +
> +/* Create vdev for the FIFO. */
> +static int mlxbf_tmfifo_create_vdev(struct device *dev,
> +                                   struct mlxbf_tmfifo *fifo,
> +                                   int vdev_id, u64 features,
> +                                   void *config, u32 size)
> +{
> +       struct mlxbf_tmfifo_vdev *tm_vdev, *reg_dev = NULL;
> +       int ret;
> +
> +       mutex_lock(&fifo->lock);
> +
> +       tm_vdev = fifo->vdev[vdev_id];
> +       if (tm_vdev) {
> +               dev_err(dev, "vdev %d already exists\n", vdev_id);
> +               ret = -EEXIST;
> +               goto fail;
> +       }
> +
> +       tm_vdev = kzalloc(sizeof(*tm_vdev), GFP_KERNEL);
> +       if (!tm_vdev) {
> +               ret = -ENOMEM;
> +               goto fail;
> +       }
> +
> +       tm_vdev->vdev.id.device = vdev_id;
> +       tm_vdev->vdev.config = &mlxbf_tmfifo_virtio_config_ops;
> +       tm_vdev->vdev.dev.parent = dev;
> +       tm_vdev->vdev.dev.release = tmfifo_virtio_dev_release;
> +       tm_vdev->features = features;
> +       if (config)
> +               memcpy(&tm_vdev->config, config, size);
> +
> +       if (mlxbf_tmfifo_alloc_vrings(fifo, tm_vdev)) {
> +               dev_err(dev, "unable to allocate vring\n");
> +               ret = -ENOMEM;
> +               goto vdev_fail;
> +       }
> +
> +       /* Allocate an output buffer for the console device. */
> +       if (vdev_id == VIRTIO_ID_CONSOLE)
> +               tm_vdev->tx_buf.buf = devm_kmalloc(dev,
> +                                                  MLXBF_TMFIFO_CON_TX_BUF_SIZE,
> +                                                  GFP_KERNEL);
> +       fifo->vdev[vdev_id] = tm_vdev;
> +
> +       /* Register the virtio device. */
> +       ret = register_virtio_device(&tm_vdev->vdev);
> +       reg_dev = tm_vdev;
> +       if (ret) {
> +               dev_err(dev, "register_virtio_device failed\n");
> +               goto vdev_fail;
> +       }
> +
> +       mutex_unlock(&fifo->lock);
> +       return 0;
> +
> +vdev_fail:
> +       mlxbf_tmfifo_free_vrings(fifo, tm_vdev);
> +       fifo->vdev[vdev_id] = NULL;
> +       if (reg_dev)
> +               put_device(&tm_vdev->vdev.dev);
> +       else
> +               kfree(tm_vdev);
> +fail:
> +       mutex_unlock(&fifo->lock);
> +       return ret;
> +}
> +
> +/* Delete vdev for the FIFO. */
> +static int mlxbf_tmfifo_delete_vdev(struct mlxbf_tmfifo *fifo, int vdev_id)
> +{
> +       struct mlxbf_tmfifo_vdev *tm_vdev;
> +
> +       mutex_lock(&fifo->lock);
> +
> +       /* Unregister vdev. */
> +       tm_vdev = fifo->vdev[vdev_id];
> +       if (tm_vdev) {
> +               unregister_virtio_device(&tm_vdev->vdev);
> +               mlxbf_tmfifo_free_vrings(fifo, tm_vdev);
> +               fifo->vdev[vdev_id] = NULL;
> +       }
> +
> +       mutex_unlock(&fifo->lock);
> +
> +       return 0;
> +}
> +
> +/* Read the configured network MAC address from efi variable. */
> +static void mlxbf_tmfifo_get_cfg_mac(u8 *mac)
> +{
> +       efi_guid_t guid = EFI_GLOBAL_VARIABLE_GUID;
> +       unsigned long size = ETH_ALEN;
> +       u8 buf[ETH_ALEN];
> +       efi_status_t rc;
> +
> +       rc = efi.get_variable(mlxbf_tmfifo_efi_name, &guid, NULL, &size, buf);
> +       if (rc == EFI_SUCCESS && size == ETH_ALEN)
> +               ether_addr_copy(mac, buf);
> +       else
> +               ether_addr_copy(mac, mlxbf_tmfifo_net_default_mac);
> +}
> +
> +/* Set TmFifo thresolds which is used to trigger interrupts. */
> +static void mlxbf_tmfifo_set_threshold(struct mlxbf_tmfifo *fifo)
> +{
> +       u64 ctl;
> +
> +       /* Get Tx FIFO size and set the low/high watermark. */
> +       ctl = readq(fifo->tx_base + MLXBF_TMFIFO_TX_CTL);
> +       fifo->tx_fifo_size =
> +               FIELD_GET(MLXBF_TMFIFO_TX_CTL__MAX_ENTRIES_MASK, ctl);
> +       ctl = (ctl & ~MLXBF_TMFIFO_TX_CTL__LWM_MASK) |
> +               FIELD_PREP(MLXBF_TMFIFO_TX_CTL__LWM_MASK,
> +                          fifo->tx_fifo_size / 2);
> +       ctl = (ctl & ~MLXBF_TMFIFO_TX_CTL__HWM_MASK) |
> +               FIELD_PREP(MLXBF_TMFIFO_TX_CTL__HWM_MASK,
> +                          fifo->tx_fifo_size - 1);
> +       writeq(ctl, fifo->tx_base + MLXBF_TMFIFO_TX_CTL);
> +
> +       /* Get Rx FIFO size and set the low/high watermark. */
> +       ctl = readq(fifo->rx_base + MLXBF_TMFIFO_RX_CTL);
> +       fifo->rx_fifo_size =
> +               FIELD_GET(MLXBF_TMFIFO_RX_CTL__MAX_ENTRIES_MASK, ctl);
> +       ctl = (ctl & ~MLXBF_TMFIFO_RX_CTL__LWM_MASK) |
> +               FIELD_PREP(MLXBF_TMFIFO_RX_CTL__LWM_MASK, 0);
> +       ctl = (ctl & ~MLXBF_TMFIFO_RX_CTL__HWM_MASK) |
> +               FIELD_PREP(MLXBF_TMFIFO_RX_CTL__HWM_MASK, 1);
> +       writeq(ctl, fifo->rx_base + MLXBF_TMFIFO_RX_CTL);
> +}
> +
> +static void mlxbf_tmfifo_cleanup(struct mlxbf_tmfifo *fifo)
> +{
> +       int i;
> +
> +       fifo->is_ready = false;
> +       del_timer_sync(&fifo->timer);
> +       mlxbf_tmfifo_disable_irqs(fifo);
> +       cancel_work_sync(&fifo->work);
> +       for (i = 0; i < MLXBF_TMFIFO_VDEV_MAX; i++)
> +               mlxbf_tmfifo_delete_vdev(fifo, i);
> +}
> +
> +/* Probe the TMFIFO. */
> +static int mlxbf_tmfifo_probe(struct platform_device *pdev)
> +{
> +       struct virtio_net_config net_config;
> +       struct device *dev = &pdev->dev;
> +       struct mlxbf_tmfifo *fifo;
> +       int i, rc;
> +
> +       fifo = devm_kzalloc(dev, sizeof(*fifo), GFP_KERNEL);
> +       if (!fifo)
> +               return -ENOMEM;
> +
> +       spin_lock_init(&fifo->spin_lock);
> +       INIT_WORK(&fifo->work, mlxbf_tmfifo_work_handler);
> +       mutex_init(&fifo->lock);
> +
> +       /* Get the resource of the Rx FIFO. */
> +       fifo->rx_base = devm_platform_ioremap_resource(pdev, 0);
> +       if (IS_ERR(fifo->rx_base))
> +               return PTR_ERR(fifo->rx_base);
> +
> +       /* Get the resource of the Tx FIFO. */
> +       fifo->tx_base = devm_platform_ioremap_resource(pdev, 1);
> +       if (IS_ERR(fifo->tx_base))
> +               return PTR_ERR(fifo->tx_base);
> +
> +       platform_set_drvdata(pdev, fifo);
> +
> +       timer_setup(&fifo->timer, mlxbf_tmfifo_timer, 0);
> +
> +       for (i = 0; i < MLXBF_TM_MAX_IRQ; i++) {
> +               fifo->irq_info[i].index = i;
> +               fifo->irq_info[i].fifo = fifo;
> +               fifo->irq_info[i].irq = platform_get_irq(pdev, i);
> +               rc = devm_request_irq(dev, fifo->irq_info[i].irq,
> +                                     mlxbf_tmfifo_irq_handler, 0,
> +                                     "tmfifo", &fifo->irq_info[i]);
> +               if (rc) {
> +                       dev_err(dev, "devm_request_irq failed\n");
> +                       fifo->irq_info[i].irq = 0;
> +                       return rc;
> +               }
> +       }
> +
> +       mlxbf_tmfifo_set_threshold(fifo);
> +
> +       /* Create the console vdev. */
> +       rc = mlxbf_tmfifo_create_vdev(dev, fifo, VIRTIO_ID_CONSOLE, 0, NULL, 0);
> +       if (rc)
> +               goto fail;
> +
> +       /* Create the network vdev. */
> +       memset(&net_config, 0, sizeof(net_config));
> +       net_config.mtu = ETH_DATA_LEN;
> +       net_config.status = VIRTIO_NET_S_LINK_UP;
> +       mlxbf_tmfifo_get_cfg_mac(net_config.mac);
> +       rc = mlxbf_tmfifo_create_vdev(dev, fifo, VIRTIO_ID_NET,
> +                                     MLXBF_TMFIFO_NET_FEATURES, &net_config,
> +                                     sizeof(net_config));
> +       if (rc)
> +               goto fail;
> +
> +       mod_timer(&fifo->timer, jiffies + MLXBF_TMFIFO_TIMER_INTERVAL);
> +
> +       fifo->is_ready = true;
> +       return 0;
> +
> +fail:
> +       mlxbf_tmfifo_cleanup(fifo);
> +       return rc;
> +}
> +
> +/* Device remove function. */
> +static int mlxbf_tmfifo_remove(struct platform_device *pdev)
> +{
> +       struct mlxbf_tmfifo *fifo = platform_get_drvdata(pdev);
> +
> +       mlxbf_tmfifo_cleanup(fifo);
> +
> +       return 0;
> +}
> +
> +static const struct acpi_device_id mlxbf_tmfifo_acpi_match[] = {
> +       { "MLNXBF01", 0 },
> +       {}
> +};
> +MODULE_DEVICE_TABLE(acpi, mlxbf_tmfifo_acpi_match);
> +
> +static struct platform_driver mlxbf_tmfifo_driver = {
> +       .probe = mlxbf_tmfifo_probe,
> +       .remove = mlxbf_tmfifo_remove,
> +       .driver = {
> +               .name = "bf-tmfifo",
> +               .acpi_match_table = mlxbf_tmfifo_acpi_match,
> +       },
> +};
> +
> +module_platform_driver(mlxbf_tmfifo_driver);
> +
> +MODULE_DESCRIPTION("Mellanox BlueField SoC TmFifo Driver");
> +MODULE_LICENSE("GPL v2");
> +MODULE_AUTHOR("Mellanox Technologies");
> --
> 1.8.3.1
>


-- 
With Best Regards,
Andy Shevchenko

^ permalink raw reply	[flat|nested] 30+ messages in thread

end of thread, other threads:[~2019-05-06  9:13 UTC | newest]

Thread overview: 30+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <b143b40446c1870fb8d422b364ead95d54552be9.1527264077.git.lsun@mellanox.com>
2019-01-28 17:28 ` [PATCH v8 0/2] TmFifo platform driver for Mellanox BlueField SoC Liming Sun
2019-01-28 17:28 ` [PATCH v8 1/2] platform/mellanox: Add TmFifo driver for Mellanox BlueField Soc Liming Sun
2019-01-29 22:06   ` Andy Shevchenko
2019-02-13 16:33     ` Liming Sun
2019-01-30  6:24   ` Vadim Pasternak
2019-01-28 17:28 ` [PATCH v8 2/2] dt-bindings: soc: Add TmFifo binding for Mellanox BlueField SoC Liming Sun
2019-02-13 13:27 ` [PATCH v9] platform/mellanox: Add TmFifo driver for Mellanox BlueField Soc Liming Sun
2019-02-13 18:11   ` Andy Shevchenko
2019-02-13 18:34     ` Liming Sun
2019-02-14 16:25     ` Liming Sun
2019-02-28 15:51     ` Liming Sun
2019-02-28 15:51 ` [PATCH v10] " Liming Sun
2019-03-05 15:34   ` Andy Shevchenko
2019-03-06 20:00     ` Liming Sun
2019-03-08 14:44       ` Liming Sun
2019-03-08 14:41 ` [PATCH v11] " Liming Sun
2019-03-26 21:13 ` Liming Sun
2019-03-28 19:56 ` [PATCH v12] " Liming Sun
2019-04-04 19:36 ` [PATCH v13] " Liming Sun
2019-04-05 15:44   ` Andy Shevchenko
2019-04-05 19:10     ` Liming Sun
2019-04-07  2:05       ` Liming Sun
2019-04-11 14:13         ` Andy Shevchenko
2019-04-12 16:15           ` Liming Sun
2019-04-07  2:03 ` [PATCH v14] " Liming Sun
2019-04-11 14:09   ` Andy Shevchenko
2019-04-12 14:23     ` Liming Sun
2019-04-12 17:30 ` [PATCH v15] " Liming Sun
2019-05-03 13:49 ` [PATCH v16] " Liming Sun
2019-05-06  9:13   ` Andy Shevchenko

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).