All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/3] add hisilicon hip04 ethernet driver
@ 2014-03-18  8:40 ` Zhangfei Gao
  0 siblings, 0 replies; 148+ messages in thread
From: Zhangfei Gao @ 2014-03-18  8:40 UTC (permalink / raw)
  To: David S. Miller; +Cc: linux-arm-kernel, netdev, devicetree, Zhangfei Gao

Zhangfei Gao (3):
  Documentation: add Device tree bindings for Hisilicon hip04 ethernet
  net: hisilicon: new hip04 MDIO driver
  net: hisilicon: new hip04 ethernet driver

 .../bindings/net/hisilicon-hip04-net.txt           |   74 ++
 drivers/net/ethernet/Kconfig                       |    1 +
 drivers/net/ethernet/Makefile                      |    1 +
 drivers/net/ethernet/hisilicon/Kconfig             |   31 +
 drivers/net/ethernet/hisilicon/Makefile            |    5 +
 drivers/net/ethernet/hisilicon/hip04_eth.c         |  717 ++++++++++++++++++++
 drivers/net/ethernet/hisilicon/hip04_mdio.c        |  190 ++++++
 7 files changed, 1019 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/net/hisilicon-hip04-net.txt
 create mode 100644 drivers/net/ethernet/hisilicon/Kconfig
 create mode 100644 drivers/net/ethernet/hisilicon/Makefile
 create mode 100644 drivers/net/ethernet/hisilicon/hip04_eth.c
 create mode 100644 drivers/net/ethernet/hisilicon/hip04_mdio.c

-- 
1.7.9.5

^ permalink raw reply	[flat|nested] 148+ messages in thread

* [PATCH 0/3] add hisilicon hip04 ethernet driver
@ 2014-03-18  8:40 ` Zhangfei Gao
  0 siblings, 0 replies; 148+ messages in thread
From: Zhangfei Gao @ 2014-03-18  8:40 UTC (permalink / raw)
  To: linux-arm-kernel

Zhangfei Gao (3):
  Documentation: add Device tree bindings for Hisilicon hip04 ethernet
  net: hisilicon: new hip04 MDIO driver
  net: hisilicon: new hip04 ethernet driver

 .../bindings/net/hisilicon-hip04-net.txt           |   74 ++
 drivers/net/ethernet/Kconfig                       |    1 +
 drivers/net/ethernet/Makefile                      |    1 +
 drivers/net/ethernet/hisilicon/Kconfig             |   31 +
 drivers/net/ethernet/hisilicon/Makefile            |    5 +
 drivers/net/ethernet/hisilicon/hip04_eth.c         |  717 ++++++++++++++++++++
 drivers/net/ethernet/hisilicon/hip04_mdio.c        |  190 ++++++
 7 files changed, 1019 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/net/hisilicon-hip04-net.txt
 create mode 100644 drivers/net/ethernet/hisilicon/Kconfig
 create mode 100644 drivers/net/ethernet/hisilicon/Makefile
 create mode 100644 drivers/net/ethernet/hisilicon/hip04_eth.c
 create mode 100644 drivers/net/ethernet/hisilicon/hip04_mdio.c

-- 
1.7.9.5

^ permalink raw reply	[flat|nested] 148+ messages in thread

* [PATCH 1/3] Documentation: add Device tree bindings for Hisilicon hip04 ethernet
  2014-03-18  8:40 ` Zhangfei Gao
@ 2014-03-18  8:40   ` Zhangfei Gao
  -1 siblings, 0 replies; 148+ messages in thread
From: Zhangfei Gao @ 2014-03-18  8:40 UTC (permalink / raw)
  To: David S. Miller; +Cc: linux-arm-kernel, netdev, devicetree, Zhangfei Gao

This patch adds the Device Tree bindings for the Hisilicon hip04
Ethernet controller, including 100M / 1000M controller.

Signed-off-by: Zhangfei Gao <zhangfei.gao@linaro.org>
---
 .../bindings/net/hisilicon-hip04-net.txt           |   74 ++++++++++++++++++++
 1 file changed, 74 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/net/hisilicon-hip04-net.txt

diff --git a/Documentation/devicetree/bindings/net/hisilicon-hip04-net.txt b/Documentation/devicetree/bindings/net/hisilicon-hip04-net.txt
new file mode 100644
index 0000000..c918f08
--- /dev/null
+++ b/Documentation/devicetree/bindings/net/hisilicon-hip04-net.txt
@@ -0,0 +1,74 @@
+Hisilicon hip04 Ethernet Controller
+
+* Ethernet controller node
+
+Required properties:
+- compatible: should be "hisilicon,hip04-mac".
+- reg: address and length of the register set for the device.
+- interrupts: interrupt for the device.
+- port: ppe port number connected to the controller: range from 0 to 31.
+- speed: 100 (100M) or 1000 (1000M).
+- id: should be different and fe should be 0.
+
+Optional Properties:
+- phy-handle : the phandle to a PHY node
+
+
+* Ethernet ppe node: control rx & tx fifos of all ethernet controllers
+
+Required properties:
+- compatible: should be "hisilicon,hip04-ppebase".
+- reg: address and length of the register set for the node.
+
+
+* MDIO bus node:
+
+Required properties:
+
+- compatible: "hisilicon,hip04-mdio"
+- Inherets from MDIO bus node binding[1]
+[1] Documentation/devicetree/bindings/net/phy.txt
+
+Example:
+	mdio {
+		compatible = "hisilicon,hip04-mdio";
+		reg = <0x28f1000 0x1000>;
+		#address-cells = <1>;
+		#size-cells = <0>;
+
+		phy0: ethernet-phy@0 {
+			reg = <0>;
+			marvell,reg-init = <18 0x14 0 0x8001>;
+			device_type = "ethernet-phy";
+		};
+
+		phy1: ethernet-phy@1 {
+			reg = <1>;
+			marvell,reg-init = <18 0x14 0 0x8001>;
+			device_type = "ethernet-phy";
+		};
+	};
+
+	ppebase: ppebase@28c0000 {
+		compatible = "hisilicon,hip04-ppebase";
+		reg = <0x28c0000 0x10000>;
+	};
+
+	fe: ethernet@28b0000 {
+		compatible = "hisilicon,hip04-mac";
+		reg = <0x28b0000 0x10000>;
+		interrupts = <0 413 4>;
+		port = <31>;
+		speed = <100>;
+		id = <0>;
+	};
+
+	ge0: ethernet@2800000 {
+		compatible = "hisilicon,hip04-mac";
+		reg = <0x2800000 0x10000>;
+		interrupts = <0 402 4>;
+		port = <0>;
+		speed = <1000>;
+		id = <1>;
+		phy-handle = <&phy0>;
+	};
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 148+ messages in thread

* [PATCH 1/3] Documentation: add Device tree bindings for Hisilicon hip04 ethernet
@ 2014-03-18  8:40   ` Zhangfei Gao
  0 siblings, 0 replies; 148+ messages in thread
From: Zhangfei Gao @ 2014-03-18  8:40 UTC (permalink / raw)
  To: linux-arm-kernel

This patch adds the Device Tree bindings for the Hisilicon hip04
Ethernet controller, including 100M / 1000M controller.

Signed-off-by: Zhangfei Gao <zhangfei.gao@linaro.org>
---
 .../bindings/net/hisilicon-hip04-net.txt           |   74 ++++++++++++++++++++
 1 file changed, 74 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/net/hisilicon-hip04-net.txt

diff --git a/Documentation/devicetree/bindings/net/hisilicon-hip04-net.txt b/Documentation/devicetree/bindings/net/hisilicon-hip04-net.txt
new file mode 100644
index 0000000..c918f08
--- /dev/null
+++ b/Documentation/devicetree/bindings/net/hisilicon-hip04-net.txt
@@ -0,0 +1,74 @@
+Hisilicon hip04 Ethernet Controller
+
+* Ethernet controller node
+
+Required properties:
+- compatible: should be "hisilicon,hip04-mac".
+- reg: address and length of the register set for the device.
+- interrupts: interrupt for the device.
+- port: ppe port number connected to the controller: range from 0 to 31.
+- speed: 100 (100M) or 1000 (1000M).
+- id: should be different and fe should be 0.
+
+Optional Properties:
+- phy-handle : the phandle to a PHY node
+
+
+* Ethernet ppe node: control rx & tx fifos of all ethernet controllers
+
+Required properties:
+- compatible: should be "hisilicon,hip04-ppebase".
+- reg: address and length of the register set for the node.
+
+
+* MDIO bus node:
+
+Required properties:
+
+- compatible: "hisilicon,hip04-mdio"
+- Inherets from MDIO bus node binding[1]
+[1] Documentation/devicetree/bindings/net/phy.txt
+
+Example:
+	mdio {
+		compatible = "hisilicon,hip04-mdio";
+		reg = <0x28f1000 0x1000>;
+		#address-cells = <1>;
+		#size-cells = <0>;
+
+		phy0: ethernet-phy at 0 {
+			reg = <0>;
+			marvell,reg-init = <18 0x14 0 0x8001>;
+			device_type = "ethernet-phy";
+		};
+
+		phy1: ethernet-phy at 1 {
+			reg = <1>;
+			marvell,reg-init = <18 0x14 0 0x8001>;
+			device_type = "ethernet-phy";
+		};
+	};
+
+	ppebase: ppebase at 28c0000 {
+		compatible = "hisilicon,hip04-ppebase";
+		reg = <0x28c0000 0x10000>;
+	};
+
+	fe: ethernet at 28b0000 {
+		compatible = "hisilicon,hip04-mac";
+		reg = <0x28b0000 0x10000>;
+		interrupts = <0 413 4>;
+		port = <31>;
+		speed = <100>;
+		id = <0>;
+	};
+
+	ge0: ethernet at 2800000 {
+		compatible = "hisilicon,hip04-mac";
+		reg = <0x2800000 0x10000>;
+		interrupts = <0 402 4>;
+		port = <0>;
+		speed = <1000>;
+		id = <1>;
+		phy-handle = <&phy0>;
+	};
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 148+ messages in thread

* [PATCH 2/3] net: hisilicon: new hip04 MDIO driver
  2014-03-18  8:40 ` Zhangfei Gao
@ 2014-03-18  8:40   ` Zhangfei Gao
  -1 siblings, 0 replies; 148+ messages in thread
From: Zhangfei Gao @ 2014-03-18  8:40 UTC (permalink / raw)
  To: David S. Miller; +Cc: linux-arm-kernel, netdev, devicetree, Zhangfei Gao

Hisilicon hip04 platform mdio driver
Reuse Marvell phy drivers/net/phy/marvell.c

Signed-off-by: Zhangfei Gao <zhangfei.gao@linaro.org>
---
 drivers/net/ethernet/Kconfig                |    1 +
 drivers/net/ethernet/Makefile               |    1 +
 drivers/net/ethernet/hisilicon/Kconfig      |   31 +++++
 drivers/net/ethernet/hisilicon/Makefile     |    5 +
 drivers/net/ethernet/hisilicon/hip04_mdio.c |  190 +++++++++++++++++++++++++++
 5 files changed, 228 insertions(+)
 create mode 100644 drivers/net/ethernet/hisilicon/Kconfig
 create mode 100644 drivers/net/ethernet/hisilicon/Makefile
 create mode 100644 drivers/net/ethernet/hisilicon/hip04_mdio.c

diff --git a/drivers/net/ethernet/Kconfig b/drivers/net/ethernet/Kconfig
index 39484b5..cef103d 100644
--- a/drivers/net/ethernet/Kconfig
+++ b/drivers/net/ethernet/Kconfig
@@ -55,6 +55,7 @@ source "drivers/net/ethernet/neterion/Kconfig"
 source "drivers/net/ethernet/faraday/Kconfig"
 source "drivers/net/ethernet/freescale/Kconfig"
 source "drivers/net/ethernet/fujitsu/Kconfig"
+source "drivers/net/ethernet/hisilicon/Kconfig"
 source "drivers/net/ethernet/hp/Kconfig"
 source "drivers/net/ethernet/ibm/Kconfig"
 source "drivers/net/ethernet/intel/Kconfig"
diff --git a/drivers/net/ethernet/Makefile b/drivers/net/ethernet/Makefile
index adf61af..f70b166 100644
--- a/drivers/net/ethernet/Makefile
+++ b/drivers/net/ethernet/Makefile
@@ -30,6 +30,7 @@ obj-$(CONFIG_NET_VENDOR_EXAR) += neterion/
 obj-$(CONFIG_NET_VENDOR_FARADAY) += faraday/
 obj-$(CONFIG_NET_VENDOR_FREESCALE) += freescale/
 obj-$(CONFIG_NET_VENDOR_FUJITSU) += fujitsu/
+obj-$(CONFIG_NET_VENDOR_HISILICON) += hisilicon/
 obj-$(CONFIG_NET_VENDOR_HP) += hp/
 obj-$(CONFIG_NET_VENDOR_IBM) += ibm/
 obj-$(CONFIG_NET_VENDOR_INTEL) += intel/
diff --git a/drivers/net/ethernet/hisilicon/Kconfig b/drivers/net/ethernet/hisilicon/Kconfig
new file mode 100644
index 0000000..4b1c065
--- /dev/null
+++ b/drivers/net/ethernet/hisilicon/Kconfig
@@ -0,0 +1,31 @@
+#
+# HISILICON device configuration
+#
+
+config NET_VENDOR_HISILICON
+	bool "Hisilicon devices"
+	default y
+	depends on ARM
+	---help---
+	  If you have a network (Ethernet) card belonging to this class, say Y
+	  and read the Ethernet-HOWTO, available from
+	  <http://www.tldp.org/docs.html#howto>.
+
+	  Note that the answer to this question doesn't directly affect the
+	  kernel: saying N will just cause the configurator to skip all
+	  the questions about MOXA ART devices. If you say Y, you will be asked
+	  for your specific card in the following questions.
+
+if NET_VENDOR_HISILICON
+
+config HIP04_ETH
+	tristate "HISILICON P04 Ethernet support"
+	select NET_CORE
+	select PHYLIB
+	select MARVELL_PHY
+	---help---
+	  If you wish to compile a kernel for a hardware with hisilicon p04 SoC and
+	  want to use the internal ethernet then you should answer Y to this.
+
+
+endif # NET_VENDOR_HISILICON
diff --git a/drivers/net/ethernet/hisilicon/Makefile b/drivers/net/ethernet/hisilicon/Makefile
new file mode 100644
index 0000000..1d6eb6e
--- /dev/null
+++ b/drivers/net/ethernet/hisilicon/Makefile
@@ -0,0 +1,5 @@
+#
+# Makefile for the HISILICON network device drivers.
+#
+
+obj-$(CONFIG_HIP04_ETH) += hip04_mdio.o
diff --git a/drivers/net/ethernet/hisilicon/hip04_mdio.c b/drivers/net/ethernet/hisilicon/hip04_mdio.c
new file mode 100644
index 0000000..960adc2
--- /dev/null
+++ b/drivers/net/ethernet/hisilicon/hip04_mdio.c
@@ -0,0 +1,190 @@
+
+/* Copyright (c) 2014 Linaro Ltd.
+ * Copyright (c) 2014 Hisilicon Limited.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ */
+
+#include <linux/module.h>
+#include <linux/platform_device.h>
+#include <linux/io.h>
+#include <linux/of_mdio.h>
+#include <linux/delay.h>
+
+#define MDIO_CMD_REG		0x0
+#define MDIO_ADDR_REG		0x4
+#define MDIO_WDATA_REG		0x8
+#define MDIO_RDATA_REG		0xc
+#define MDIO_STA_REG		0x10
+
+#define MDIO_START		BIT(14)
+#define MDIO_R_VALID		BIT(1)
+#define MDIO_READ	        (BIT(12) | BIT(11) | MDIO_START)
+#define MDIO_WRITE	        (BIT(12) | BIT(10) | MDIO_START)
+
+struct hip04_mdio_priv {
+	void __iomem *base;
+};
+
+#define WAIT_TIMEOUT 10
+static int hip04_mdio_wait_ready(struct mii_bus *bus)
+{
+	struct hip04_mdio_priv *priv = bus->priv;
+	int i;
+
+	for (i = 0; readl_relaxed(priv->base + MDIO_CMD_REG) & MDIO_START; i++) {
+		if (i == WAIT_TIMEOUT)
+			return -ETIMEDOUT;
+		msleep(20);
+	}
+
+	return 0;
+}
+
+static int hip04_mdio_read(struct mii_bus *bus, int mii_id, int regnum)
+{
+	struct hip04_mdio_priv *priv = bus->priv;
+	u32 val;
+	int ret;
+
+	ret = hip04_mdio_wait_ready(bus);
+	if (ret < 0)
+		goto out;
+
+	val = regnum | (mii_id << 5) | MDIO_READ;
+	writel_relaxed(val, priv->base + MDIO_CMD_REG);
+
+	ret = hip04_mdio_wait_ready(bus);
+	if (ret < 0)
+		goto out;
+	val = readl_relaxed(priv->base + MDIO_STA_REG);
+	if (val & MDIO_R_VALID) {
+		dev_err(bus->parent, "SMI bus read not valid\n");
+		ret = -ENODEV;
+		goto out;
+	}
+	val = readl_relaxed(priv->base + MDIO_RDATA_REG);
+	ret = val & 0xFFFF;
+out:
+	return ret;
+}
+
+static int hip04_mdio_write(struct mii_bus *bus, int mii_id,
+			    int regnum, u16 value)
+{
+	struct hip04_mdio_priv *priv = bus->priv;
+	u32 val;
+	int ret;
+
+	ret = hip04_mdio_wait_ready(bus);
+	if (ret < 0)
+		goto out;
+
+	writel_relaxed(value, priv->base + MDIO_WDATA_REG);
+	val = regnum | (mii_id << 5) | MDIO_WRITE;
+	writel_relaxed(val, priv->base + MDIO_CMD_REG);
+out:
+	return ret;
+}
+
+
+static int hip04_mdio_reset(struct mii_bus *bus)
+{
+	int temp, err, i;
+
+	for (i = 0; i < 2; i++) {
+		hip04_mdio_write(bus, i, 22, 0);
+		temp = hip04_mdio_read(bus, i, MII_BMCR);
+		temp |= BMCR_RESET;
+		err = hip04_mdio_write(bus, i, MII_BMCR, temp);
+		if (err < 0)
+			return err;
+	}
+
+	mdelay(500);
+	return 0;
+}
+
+static int hip04_mdio_probe(struct platform_device *pdev)
+{
+	struct resource *r;
+	struct mii_bus *bus;
+	struct hip04_mdio_priv *priv;
+	int ret;
+
+	r = platform_get_resource(pdev, IORESOURCE_MEM, 0);
+	if (!r) {
+		dev_err(&pdev->dev, "No SMI register address given\n");
+		return -ENODEV;
+	}
+
+	bus = mdiobus_alloc_size(sizeof(struct hip04_mdio_priv));
+	if (!bus) {
+		dev_err(&pdev->dev, "Cannot allocate MDIO bus\n");
+		return -ENOMEM;
+	}
+
+	bus->name = "hip04_mdio_bus";
+	bus->read = hip04_mdio_read;
+	bus->write = hip04_mdio_write;
+	bus->reset = hip04_mdio_reset;
+	snprintf(bus->id, MII_BUS_ID_SIZE, "%s-mii",
+		 dev_name(&pdev->dev));
+	bus->parent = &pdev->dev;
+	priv = bus->priv;
+	priv->base = devm_ioremap(&pdev->dev, r->start, resource_size(r));
+	if (!priv->base) {
+		dev_err(&pdev->dev, "Unable to remap SMI register\n");
+		ret = -ENODEV;
+		goto out_mdio;
+	}
+
+	ret = of_mdiobus_register(bus, pdev->dev.of_node);
+	if (ret < 0) {
+		dev_err(&pdev->dev, "Cannot register MDIO bus (%d)\n", ret);
+		goto out_mdio;
+	}
+
+	platform_set_drvdata(pdev, bus);
+
+	return 0;
+
+out_mdio:
+	mdiobus_free(bus);
+	return ret;
+}
+
+static int hip04_mdio_remove(struct platform_device *pdev)
+{
+	struct mii_bus *bus = platform_get_drvdata(pdev);
+
+	mdiobus_unregister(bus);
+	mdiobus_free(bus);
+
+	return 0;
+}
+
+static const struct of_device_id hip04_mdio_match[] = {
+	{ .compatible = "hisilicon,hip04-mdio" },
+	{ }
+};
+MODULE_DEVICE_TABLE(of, hip04_mdio_match);
+
+static struct platform_driver hip04_mdio_driver = {
+	.probe = hip04_mdio_probe,
+	.remove = hip04_mdio_remove,
+	.driver = {
+		.name = "hip04-mdio",
+		.owner = THIS_MODULE,
+		.of_match_table = hip04_mdio_match,
+	},
+};
+
+module_platform_driver(hip04_mdio_driver);
+
+MODULE_DESCRIPTION("HISILICON P04 MDIO interface driver");
+MODULE_LICENSE("GPL v2");
+MODULE_ALIAS("platform:hip04-mdio");
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 148+ messages in thread

* [PATCH 2/3] net: hisilicon: new hip04 MDIO driver
@ 2014-03-18  8:40   ` Zhangfei Gao
  0 siblings, 0 replies; 148+ messages in thread
From: Zhangfei Gao @ 2014-03-18  8:40 UTC (permalink / raw)
  To: linux-arm-kernel

Hisilicon hip04 platform mdio driver
Reuse Marvell phy drivers/net/phy/marvell.c

Signed-off-by: Zhangfei Gao <zhangfei.gao@linaro.org>
---
 drivers/net/ethernet/Kconfig                |    1 +
 drivers/net/ethernet/Makefile               |    1 +
 drivers/net/ethernet/hisilicon/Kconfig      |   31 +++++
 drivers/net/ethernet/hisilicon/Makefile     |    5 +
 drivers/net/ethernet/hisilicon/hip04_mdio.c |  190 +++++++++++++++++++++++++++
 5 files changed, 228 insertions(+)
 create mode 100644 drivers/net/ethernet/hisilicon/Kconfig
 create mode 100644 drivers/net/ethernet/hisilicon/Makefile
 create mode 100644 drivers/net/ethernet/hisilicon/hip04_mdio.c

diff --git a/drivers/net/ethernet/Kconfig b/drivers/net/ethernet/Kconfig
index 39484b5..cef103d 100644
--- a/drivers/net/ethernet/Kconfig
+++ b/drivers/net/ethernet/Kconfig
@@ -55,6 +55,7 @@ source "drivers/net/ethernet/neterion/Kconfig"
 source "drivers/net/ethernet/faraday/Kconfig"
 source "drivers/net/ethernet/freescale/Kconfig"
 source "drivers/net/ethernet/fujitsu/Kconfig"
+source "drivers/net/ethernet/hisilicon/Kconfig"
 source "drivers/net/ethernet/hp/Kconfig"
 source "drivers/net/ethernet/ibm/Kconfig"
 source "drivers/net/ethernet/intel/Kconfig"
diff --git a/drivers/net/ethernet/Makefile b/drivers/net/ethernet/Makefile
index adf61af..f70b166 100644
--- a/drivers/net/ethernet/Makefile
+++ b/drivers/net/ethernet/Makefile
@@ -30,6 +30,7 @@ obj-$(CONFIG_NET_VENDOR_EXAR) += neterion/
 obj-$(CONFIG_NET_VENDOR_FARADAY) += faraday/
 obj-$(CONFIG_NET_VENDOR_FREESCALE) += freescale/
 obj-$(CONFIG_NET_VENDOR_FUJITSU) += fujitsu/
+obj-$(CONFIG_NET_VENDOR_HISILICON) += hisilicon/
 obj-$(CONFIG_NET_VENDOR_HP) += hp/
 obj-$(CONFIG_NET_VENDOR_IBM) += ibm/
 obj-$(CONFIG_NET_VENDOR_INTEL) += intel/
diff --git a/drivers/net/ethernet/hisilicon/Kconfig b/drivers/net/ethernet/hisilicon/Kconfig
new file mode 100644
index 0000000..4b1c065
--- /dev/null
+++ b/drivers/net/ethernet/hisilicon/Kconfig
@@ -0,0 +1,31 @@
+#
+# HISILICON device configuration
+#
+
+config NET_VENDOR_HISILICON
+	bool "Hisilicon devices"
+	default y
+	depends on ARM
+	---help---
+	  If you have a network (Ethernet) card belonging to this class, say Y
+	  and read the Ethernet-HOWTO, available from
+	  <http://www.tldp.org/docs.html#howto>.
+
+	  Note that the answer to this question doesn't directly affect the
+	  kernel: saying N will just cause the configurator to skip all
+	  the questions about MOXA ART devices. If you say Y, you will be asked
+	  for your specific card in the following questions.
+
+if NET_VENDOR_HISILICON
+
+config HIP04_ETH
+	tristate "HISILICON P04 Ethernet support"
+	select NET_CORE
+	select PHYLIB
+	select MARVELL_PHY
+	---help---
+	  If you wish to compile a kernel for a hardware with hisilicon p04 SoC and
+	  want to use the internal ethernet then you should answer Y to this.
+
+
+endif # NET_VENDOR_HISILICON
diff --git a/drivers/net/ethernet/hisilicon/Makefile b/drivers/net/ethernet/hisilicon/Makefile
new file mode 100644
index 0000000..1d6eb6e
--- /dev/null
+++ b/drivers/net/ethernet/hisilicon/Makefile
@@ -0,0 +1,5 @@
+#
+# Makefile for the HISILICON network device drivers.
+#
+
+obj-$(CONFIG_HIP04_ETH) += hip04_mdio.o
diff --git a/drivers/net/ethernet/hisilicon/hip04_mdio.c b/drivers/net/ethernet/hisilicon/hip04_mdio.c
new file mode 100644
index 0000000..960adc2
--- /dev/null
+++ b/drivers/net/ethernet/hisilicon/hip04_mdio.c
@@ -0,0 +1,190 @@
+
+/* Copyright (c) 2014 Linaro Ltd.
+ * Copyright (c) 2014 Hisilicon Limited.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ */
+
+#include <linux/module.h>
+#include <linux/platform_device.h>
+#include <linux/io.h>
+#include <linux/of_mdio.h>
+#include <linux/delay.h>
+
+#define MDIO_CMD_REG		0x0
+#define MDIO_ADDR_REG		0x4
+#define MDIO_WDATA_REG		0x8
+#define MDIO_RDATA_REG		0xc
+#define MDIO_STA_REG		0x10
+
+#define MDIO_START		BIT(14)
+#define MDIO_R_VALID		BIT(1)
+#define MDIO_READ	        (BIT(12) | BIT(11) | MDIO_START)
+#define MDIO_WRITE	        (BIT(12) | BIT(10) | MDIO_START)
+
+struct hip04_mdio_priv {
+	void __iomem *base;
+};
+
+#define WAIT_TIMEOUT 10
+static int hip04_mdio_wait_ready(struct mii_bus *bus)
+{
+	struct hip04_mdio_priv *priv = bus->priv;
+	int i;
+
+	for (i = 0; readl_relaxed(priv->base + MDIO_CMD_REG) & MDIO_START; i++) {
+		if (i == WAIT_TIMEOUT)
+			return -ETIMEDOUT;
+		msleep(20);
+	}
+
+	return 0;
+}
+
+static int hip04_mdio_read(struct mii_bus *bus, int mii_id, int regnum)
+{
+	struct hip04_mdio_priv *priv = bus->priv;
+	u32 val;
+	int ret;
+
+	ret = hip04_mdio_wait_ready(bus);
+	if (ret < 0)
+		goto out;
+
+	val = regnum | (mii_id << 5) | MDIO_READ;
+	writel_relaxed(val, priv->base + MDIO_CMD_REG);
+
+	ret = hip04_mdio_wait_ready(bus);
+	if (ret < 0)
+		goto out;
+	val = readl_relaxed(priv->base + MDIO_STA_REG);
+	if (val & MDIO_R_VALID) {
+		dev_err(bus->parent, "SMI bus read not valid\n");
+		ret = -ENODEV;
+		goto out;
+	}
+	val = readl_relaxed(priv->base + MDIO_RDATA_REG);
+	ret = val & 0xFFFF;
+out:
+	return ret;
+}
+
+static int hip04_mdio_write(struct mii_bus *bus, int mii_id,
+			    int regnum, u16 value)
+{
+	struct hip04_mdio_priv *priv = bus->priv;
+	u32 val;
+	int ret;
+
+	ret = hip04_mdio_wait_ready(bus);
+	if (ret < 0)
+		goto out;
+
+	writel_relaxed(value, priv->base + MDIO_WDATA_REG);
+	val = regnum | (mii_id << 5) | MDIO_WRITE;
+	writel_relaxed(val, priv->base + MDIO_CMD_REG);
+out:
+	return ret;
+}
+
+
+static int hip04_mdio_reset(struct mii_bus *bus)
+{
+	int temp, err, i;
+
+	for (i = 0; i < 2; i++) {
+		hip04_mdio_write(bus, i, 22, 0);
+		temp = hip04_mdio_read(bus, i, MII_BMCR);
+		temp |= BMCR_RESET;
+		err = hip04_mdio_write(bus, i, MII_BMCR, temp);
+		if (err < 0)
+			return err;
+	}
+
+	mdelay(500);
+	return 0;
+}
+
+static int hip04_mdio_probe(struct platform_device *pdev)
+{
+	struct resource *r;
+	struct mii_bus *bus;
+	struct hip04_mdio_priv *priv;
+	int ret;
+
+	r = platform_get_resource(pdev, IORESOURCE_MEM, 0);
+	if (!r) {
+		dev_err(&pdev->dev, "No SMI register address given\n");
+		return -ENODEV;
+	}
+
+	bus = mdiobus_alloc_size(sizeof(struct hip04_mdio_priv));
+	if (!bus) {
+		dev_err(&pdev->dev, "Cannot allocate MDIO bus\n");
+		return -ENOMEM;
+	}
+
+	bus->name = "hip04_mdio_bus";
+	bus->read = hip04_mdio_read;
+	bus->write = hip04_mdio_write;
+	bus->reset = hip04_mdio_reset;
+	snprintf(bus->id, MII_BUS_ID_SIZE, "%s-mii",
+		 dev_name(&pdev->dev));
+	bus->parent = &pdev->dev;
+	priv = bus->priv;
+	priv->base = devm_ioremap(&pdev->dev, r->start, resource_size(r));
+	if (!priv->base) {
+		dev_err(&pdev->dev, "Unable to remap SMI register\n");
+		ret = -ENODEV;
+		goto out_mdio;
+	}
+
+	ret = of_mdiobus_register(bus, pdev->dev.of_node);
+	if (ret < 0) {
+		dev_err(&pdev->dev, "Cannot register MDIO bus (%d)\n", ret);
+		goto out_mdio;
+	}
+
+	platform_set_drvdata(pdev, bus);
+
+	return 0;
+
+out_mdio:
+	mdiobus_free(bus);
+	return ret;
+}
+
+static int hip04_mdio_remove(struct platform_device *pdev)
+{
+	struct mii_bus *bus = platform_get_drvdata(pdev);
+
+	mdiobus_unregister(bus);
+	mdiobus_free(bus);
+
+	return 0;
+}
+
+static const struct of_device_id hip04_mdio_match[] = {
+	{ .compatible = "hisilicon,hip04-mdio" },
+	{ }
+};
+MODULE_DEVICE_TABLE(of, hip04_mdio_match);
+
+static struct platform_driver hip04_mdio_driver = {
+	.probe = hip04_mdio_probe,
+	.remove = hip04_mdio_remove,
+	.driver = {
+		.name = "hip04-mdio",
+		.owner = THIS_MODULE,
+		.of_match_table = hip04_mdio_match,
+	},
+};
+
+module_platform_driver(hip04_mdio_driver);
+
+MODULE_DESCRIPTION("HISILICON P04 MDIO interface driver");
+MODULE_LICENSE("GPL v2");
+MODULE_ALIAS("platform:hip04-mdio");
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 148+ messages in thread

* [PATCH 3/3] net: hisilicon: new hip04 ethernet driver
  2014-03-18  8:40 ` Zhangfei Gao
@ 2014-03-18  8:40     ` Zhangfei Gao
  -1 siblings, 0 replies; 148+ messages in thread
From: Zhangfei Gao @ 2014-03-18  8:40 UTC (permalink / raw)
  To: David S. Miller
  Cc: linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	netdev-u79uwXL29TY76Z2rM5mHXA, devicetree-u79uwXL29TY76Z2rM5mHXA,
	Zhangfei Gao

Support Hisilicon hip04 ethernet driver, including 100M / 1000M controller

Signed-off-by: Zhangfei Gao <zhangfei.gao-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org>
---
 drivers/net/ethernet/hisilicon/Makefile    |    2 +-
 drivers/net/ethernet/hisilicon/hip04_eth.c |  717 ++++++++++++++++++++++++++++
 2 files changed, 718 insertions(+), 1 deletion(-)
 create mode 100644 drivers/net/ethernet/hisilicon/hip04_eth.c

diff --git a/drivers/net/ethernet/hisilicon/Makefile b/drivers/net/ethernet/hisilicon/Makefile
index 1d6eb6e..e6fe7af 100644
--- a/drivers/net/ethernet/hisilicon/Makefile
+++ b/drivers/net/ethernet/hisilicon/Makefile
@@ -2,4 +2,4 @@
 # Makefile for the HISILICON network device drivers.
 #
 
-obj-$(CONFIG_HIP04_ETH) += hip04_mdio.o
+obj-$(CONFIG_HIP04_ETH) += hip04_eth.o hip04_mdio.o
diff --git a/drivers/net/ethernet/hisilicon/hip04_eth.c b/drivers/net/ethernet/hisilicon/hip04_eth.c
new file mode 100644
index 0000000..b12e0df
--- /dev/null
+++ b/drivers/net/ethernet/hisilicon/hip04_eth.c
@@ -0,0 +1,717 @@
+
+/* Copyright (c) 2014 Linaro Ltd.
+ * Copyright (c) 2014 Hisilicon Limited.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ */
+
+#include <linux/module.h>
+#include <linux/etherdevice.h>
+#include <linux/platform_device.h>
+#include <linux/interrupt.h>
+#include <linux/of_address.h>
+#include <linux/dmapool.h>
+#include <linux/phy.h>
+#include <linux/of_mdio.h>
+
+#define PPE_CFG_RX_CFF_ADDR		0x100
+#define PPE_CFG_POOL_GRP		0x300
+#define PPE_CFG_RX_BUF_SIZE		0x400
+#define PPE_CFG_RX_FIFO_SIZE		0x500
+#define PPE_CURR_BUF_CNT_REG		0xa200
+
+#define GE_DUPLEX_TYPE			0x8
+#define GE_MAX_FRM_SIZE_REG		0x3c
+#define GE_PORT_MODE			0x40
+#define GE_PORT_EN			0x44
+#define GE_SHORT_RUNTS_THR_REG		0x50
+#define GE_TX_LOCAL_PAGE_REG		0x5c
+#define GE_TRANSMIT_CONTROL_REG		0x60
+#define GE_CF_CRC_STRIP_REG		0x1b0
+#define GE_MODE_CHANGE_EN		0x1b4
+#define GE_RECV_CONTROL_REG		0x1e0
+#define GE_STATION_MAC_ADDRESS		0x210
+#define PPE_CFG_TX_PKT_BD_ADDR		0x420
+#define PPE_CFG_MAX_FRAME_LEN_REG	0x408
+#define PPE_CFG_BUS_CTRL_REG		0x424
+#define PPE_CFG_RX_CTRL_REG		0x428
+#define PPE_CFG_RX_PKT_MODE_REG		0x438
+#define PPE_CFG_QOS_VMID_GEN		0x500
+#define PPE_CFG_RX_PKT_INT		0x538
+#define PPE_INTEN			0x600
+#define PPE_INTSTS			0x608
+#define PPE_RINT			0x604
+#define PPE_CFG_STS_MODE		0x700
+#define PPE_HIS_RX_PKT_CNT		0x804
+
+/* REG_INTERRUPT */
+#define RCV_INT				BIT(10)
+#define RCV_NOBUF			BIT(8)
+#define DEF_INT_MASK			0x41fdf
+
+#define RX_DESC_NUM			64
+#define TX_DESC_NUM			64
+#define TX_NEXT(N)			(((N) + 1) & (TX_DESC_NUM-1))
+#define RX_NEXT(N)			(((N) + 1) & (RX_DESC_NUM-1))
+
+#define GMAC_PPE_RX_PKT_MAX_LEN		379
+#define GMAC_MAX_PKT_LEN		1516
+#define DESC_DEF_CFG			0x14
+#define RX_BUF_SIZE			1600
+#define TX_TIMEOUT			(6 * HZ)
+
+#define DRV_NAME			"hip04-ether"
+
+struct tx_desc {
+	u32 send_addr;
+	u16 send_size;
+	u16 reserved[3];
+	u32 cfg;
+	u32 wb_addr;
+};
+
+struct rx_desc {
+	u16 pkt_len;
+	u16 reserved_16;
+	u32 reserve[8];
+};
+
+struct hip04_priv {
+	void __iomem *base;
+	unsigned int port;
+	unsigned int speed;
+	unsigned int duplex;
+	unsigned int id;
+	unsigned int reg_inten;
+
+	struct napi_struct napi;
+	struct net_device *ndev;
+	struct dma_pool *desc_pool;
+
+	struct sk_buff *tx_skb[TX_DESC_NUM];
+	struct tx_desc *td_ring[TX_DESC_NUM];
+	dma_addr_t td_phys[TX_DESC_NUM];
+	spinlock_t txlock;
+	unsigned int tx_head;
+	unsigned int tx_tail;
+	unsigned int tx_count;
+
+	unsigned char *rx_buf[RX_DESC_NUM];
+	unsigned int rx_head;
+	unsigned int rx_buf_size;
+
+	struct device_node *phy_node;
+	struct phy_device *phy;
+};
+
+static void __iomem *ppebase;
+
+static void hip04_config_port(struct hip04_priv *priv, u32 speed, u32 duplex)
+{
+	u32 val;
+
+	priv->speed = speed;
+	priv->duplex = duplex;
+
+	switch (speed) {
+	case SPEED_1000:
+		val = 8;
+		break;
+	case SPEED_100:
+		if (priv->id)
+			val = 7;
+		else
+			val = 1;
+		break;
+	default:
+		val = 0;
+		break;
+	}
+	writel_relaxed(val, priv->base + GE_PORT_MODE);
+
+	val = (duplex) ? BIT(0) : 0;
+	writel_relaxed(val, priv->base + GE_DUPLEX_TYPE);
+
+	val = BIT(0);
+	writel_relaxed(val, priv->base + GE_MODE_CHANGE_EN);
+}
+
+static void hip04_reset_ppe(struct hip04_priv *priv)
+{
+	u32 val;
+
+	do {
+		val =
+		readl_relaxed(ppebase + priv->port * 4 + PPE_CURR_BUF_CNT_REG);
+		readl_relaxed(ppebase + priv->port * 4 + PPE_CFG_RX_CFF_ADDR);
+	} while (val & 0xfff);
+}
+
+static void hip04_config_fifo(struct hip04_priv *priv)
+{
+	u32 val;
+
+	val = readl_relaxed(priv->base + PPE_CFG_STS_MODE);
+	val |= BIT(12);			/* PPE_HIS_RX_PKT_CNT read clear */
+	writel_relaxed(val, priv->base + PPE_CFG_STS_MODE);
+
+	val = BIT(priv->port);
+	writel_relaxed(val, ppebase + priv->port * 4 + PPE_CFG_POOL_GRP);
+
+	val = priv->port << 8;
+	val |= BIT(14);
+	writel_relaxed(val, priv->base + PPE_CFG_QOS_VMID_GEN);
+
+	val = RX_BUF_SIZE;
+	writel_relaxed(val, ppebase + priv->port * 4 + PPE_CFG_RX_BUF_SIZE);
+
+	val = RX_DESC_NUM << 16;	/* depth */
+	val |= BIT(11);			/* seq: first set first ues */
+	val |= RX_DESC_NUM * priv->id;	/* start_addr */
+	writel_relaxed(val, ppebase + priv->port * 4 + PPE_CFG_RX_FIFO_SIZE);
+
+	/* pkt store format */
+	val = NET_IP_ALIGN << 11;	/* align */
+	writel_relaxed(val, priv->base + PPE_CFG_RX_CTRL_REG);
+
+	/* following cfg required for 1000M */
+	/* pkt mode */
+	val = BIT(18);			/* align */
+	writel_relaxed(val, priv->base + PPE_CFG_RX_PKT_MODE_REG);
+
+	/* set bus ctrl */
+	val = BIT(14);			/* buffer locally release */
+	val |= BIT(0);			/* big endian */
+	writel_relaxed(val, priv->base + PPE_CFG_BUS_CTRL_REG);
+
+	/* set max pkt len, curtail if exceed */
+	val = GMAC_PPE_RX_PKT_MAX_LEN;	/* max buffer len */
+	writel_relaxed(val, priv->base + PPE_CFG_MAX_FRAME_LEN_REG);
+
+	/* set max len of each pkt */
+	val = GMAC_MAX_PKT_LEN;		/* max buffer len */
+	writel_relaxed(val, priv->base + GE_MAX_FRM_SIZE_REG);
+
+	/* set min len of each pkt */
+	val = 31;			/* min buffer len */
+	writel_relaxed(val, priv->base + GE_SHORT_RUNTS_THR_REG);
+
+	/* tx */
+	val = readl_relaxed(priv->base + GE_TRANSMIT_CONTROL_REG);
+	val |= BIT(5);			/* tx auto neg */
+	val |= BIT(6);			/* tx add crc */
+	val |= BIT(7);			/* tx short pad through */
+	writel_relaxed(val, priv->base + GE_TRANSMIT_CONTROL_REG);
+
+	/* rx crc */
+	val = BIT(0);			/* rx strip crc */
+	writel_relaxed(val, priv->base + GE_CF_CRC_STRIP_REG);
+
+	/* rx */
+	val = readl_relaxed(priv->base + GE_RECV_CONTROL_REG);
+	val |= BIT(3);			/* rx strip pad */
+	val |= BIT(4);			/* run pkt en */
+	writel_relaxed(val, priv->base + GE_RECV_CONTROL_REG);
+
+	/* auto neg control */
+	val = BIT(0);
+	writel_relaxed(val, priv->base + GE_TX_LOCAL_PAGE_REG);
+}
+
+static void hip04_mac_enable(struct net_device *ndev, bool enable)
+{
+	struct hip04_priv *priv = netdev_priv(ndev);
+	u32 val;
+
+	if (enable) {
+		/* enable tx & rx */
+		val = readl_relaxed(priv->base + GE_PORT_EN);
+		val |= BIT(1);		/* rx*/
+		val |= BIT(2);		/* tx*/
+		writel_relaxed(val, priv->base + GE_PORT_EN);
+
+		/* enable interrupt */
+		priv->reg_inten = DEF_INT_MASK;
+		writel_relaxed(priv->reg_inten, priv->base + PPE_INTEN);
+
+		/* clear rx int */
+		val = RCV_INT;
+		writel_relaxed(val, priv->base + PPE_RINT);
+
+		/* config recv int*/
+		val = BIT(6);		/* int threshold 1 package */
+		val |= 0x4;		/* recv timeout */
+		writel_relaxed(val, priv->base + PPE_CFG_RX_PKT_INT);
+	} else {
+		/* disable int */
+		priv->reg_inten &= ~(RCV_INT | RCV_NOBUF);
+		writel_relaxed(priv->reg_inten, priv->base + PPE_INTEN);
+
+		/* disable tx & rx */
+		val = readl_relaxed(priv->base + GE_PORT_EN);
+		val &= ~(BIT(1));	/* rx*/
+		val &= ~(BIT(2));	/* tx*/
+		writel_relaxed(val, priv->base + GE_PORT_EN);
+	}
+}
+
+static void hip04_set_xmit_desc(struct hip04_priv *priv, dma_addr_t phys)
+{
+	writel_relaxed(phys, priv->base + PPE_CFG_TX_PKT_BD_ADDR);
+}
+
+static void hip04_set_recv_desc(struct hip04_priv *priv, dma_addr_t phys)
+{
+	writel_relaxed(phys, ppebase + priv->port * 4 + PPE_CFG_RX_CFF_ADDR);
+}
+
+static u32 hip04_recv_cnt(struct hip04_priv *priv)
+{
+	return readl_relaxed(priv->base + PPE_HIS_RX_PKT_CNT);
+}
+
+static void hip04_update_mac_address(struct net_device *ndev)
+{
+	struct hip04_priv *priv = netdev_priv(ndev);
+
+	writel_relaxed(((ndev->dev_addr[0] << 8) | (ndev->dev_addr[1])),
+			priv->base + GE_STATION_MAC_ADDRESS);
+	writel_relaxed(((ndev->dev_addr[2] << 24) | (ndev->dev_addr[3] << 16) |
+			(ndev->dev_addr[4] << 8) | (ndev->dev_addr[5])),
+			priv->base + GE_STATION_MAC_ADDRESS + 4);
+}
+
+static int hip04_set_mac_address(struct net_device *ndev, void *addr)
+{
+	eth_mac_addr(ndev, addr);
+	hip04_update_mac_address(ndev);
+	return 0;
+}
+
+static void endian_change(void *p, int size)
+{
+	unsigned int *to_cover = (unsigned int *)p;
+	int i;
+
+	size = size >> 2;
+	for (i = 0; i < size; i++)
+		*(to_cover+i) = htonl(*(to_cover+i));
+}
+
+static int hip04_rx_poll(struct napi_struct *napi, int budget)
+{
+	struct hip04_priv *priv = container_of(napi,
+			      struct hip04_priv, napi);
+	struct net_device *ndev = priv->ndev;
+	struct sk_buff *skb;
+	struct rx_desc *desc;
+	unsigned char *buf;
+	int rx = 0;
+	unsigned int cnt = hip04_recv_cnt(priv);
+	unsigned int len, tmp[16];
+
+	while (cnt) {
+		buf = priv->rx_buf[priv->rx_head];
+		skb = build_skb(buf, priv->rx_buf_size);
+		if (unlikely(!skb))
+			net_dbg_ratelimited("build_skb failed\n");
+		dma_map_single(&ndev->dev, skb->data,
+			RX_BUF_SIZE, DMA_FROM_DEVICE);
+		memcpy(tmp, skb->data, 64);
+		endian_change((void *)tmp, 64);
+		desc = (struct rx_desc *)tmp;
+		len = desc->pkt_len;
+
+		if (len > RX_BUF_SIZE)
+			len = RX_BUF_SIZE;
+		if (0 == len)
+			break;
+
+		skb_reserve(skb, NET_SKB_PAD + NET_IP_ALIGN);
+		skb_put(skb, len);
+		skb->protocol = eth_type_trans(skb, ndev);
+		napi_gro_receive(&priv->napi, skb);
+
+		buf = netdev_alloc_frag(priv->rx_buf_size);
+		if (!buf)
+			return -ENOMEM;
+		priv->rx_buf[priv->rx_head] = buf;
+		dma_map_single(&ndev->dev, buf, RX_BUF_SIZE, DMA_TO_DEVICE);
+		hip04_set_recv_desc(priv, virt_to_phys(buf));
+
+		priv->rx_head = RX_NEXT(priv->rx_head);
+		if (rx++ >= budget)
+			break;
+
+		if (--cnt == 0)
+			cnt = hip04_recv_cnt(priv);
+	}
+
+	if (rx < budget) {
+		napi_gro_flush(napi, false);
+		__napi_complete(napi);
+	}
+
+	/* enable rx interrupt */
+	priv->reg_inten |= RCV_INT | RCV_NOBUF;
+	writel_relaxed(priv->reg_inten, priv->base + PPE_INTEN);
+
+	return rx;
+}
+
+static irqreturn_t hip04_mac_interrupt(int irq, void *dev_id)
+{
+	struct net_device *ndev = (struct net_device *) dev_id;
+	struct hip04_priv *priv = netdev_priv(ndev);
+	u32 ists = readl_relaxed(priv->base + PPE_INTSTS);
+	u32 val = DEF_INT_MASK;
+
+	writel_relaxed(val, priv->base + PPE_RINT);
+
+	if ((ists & RCV_INT) || (ists & RCV_NOBUF)) {
+		if (napi_schedule_prep(&priv->napi)) {
+			/* disable rx interrupt */
+			priv->reg_inten &= ~(RCV_INT | RCV_NOBUF);
+			writel_relaxed(priv->reg_inten, priv->base + PPE_INTEN);
+			__napi_schedule(&priv->napi);
+		}
+	}
+
+	return IRQ_HANDLED;
+}
+
+static void hip04_tx_reclaim(struct net_device *ndev, bool force)
+{
+	struct hip04_priv *priv = netdev_priv(ndev);
+	unsigned tx_head = priv->tx_head;
+	unsigned tx_tail = priv->tx_tail;
+	struct tx_desc *desc = priv->td_ring[priv->tx_tail];
+
+	spin_lock_irq(&priv->txlock);
+	while (tx_tail != tx_head) {
+		if (desc->send_addr != 0) {
+			if (force)
+				desc->send_addr = 0;
+			else
+				break;
+		}
+		dev_kfree_skb_irq(priv->tx_skb[tx_tail]);
+		priv->tx_skb[tx_tail] = NULL;
+		tx_tail = TX_NEXT(tx_tail);
+		priv->tx_count--;
+	}
+	priv->tx_tail = tx_tail;
+	spin_unlock_irq(&priv->txlock);
+}
+
+static int hip04_mac_start_xmit(struct sk_buff *skb, struct net_device *ndev)
+{
+	struct hip04_priv *priv = netdev_priv(ndev);
+	struct tx_desc *desc = priv->td_ring[priv->tx_head];
+	unsigned int tx_head = priv->tx_head;
+	int ret;
+
+	hip04_tx_reclaim(ndev, false);
+
+	spin_lock_irq(&priv->txlock);
+	if (priv->tx_count++ >= TX_DESC_NUM) {
+		net_dbg_ratelimited("no TX space for packet\n");
+		netif_stop_queue(ndev);
+		ret = NETDEV_TX_BUSY;
+		goto out_unlock;
+	}
+
+	priv->tx_skb[tx_head] = skb;
+	dma_map_single(&ndev->dev, skb->data, skb->len, DMA_TO_DEVICE);
+	memset((void *)desc, 0, sizeof(*desc));
+	desc->send_addr = (unsigned int)virt_to_phys(skb->data);
+	desc->send_size = skb->len;
+	desc->cfg = DESC_DEF_CFG;
+	desc->wb_addr = priv->td_phys[tx_head];
+	endian_change(desc, 64);
+	skb_tx_timestamp(skb);
+	hip04_set_xmit_desc(priv, priv->td_phys[tx_head]);
+
+	priv->tx_head = TX_NEXT(tx_head);
+	ret = NETDEV_TX_OK;
+out_unlock:
+	spin_unlock_irq(&priv->txlock);
+
+	return ret;
+}
+
+static void hip04_adjust_link(struct net_device *ndev)
+{
+	struct hip04_priv *priv = netdev_priv(ndev);
+	struct phy_device *phy = priv->phy;
+
+	if ((priv->speed != phy->speed) || (priv->duplex != phy->duplex)) {
+		hip04_config_port(priv, phy->speed, phy->duplex);
+		phy_print_status(phy);
+	}
+}
+
+static int hip04_mac_open(struct net_device *ndev)
+{
+	struct hip04_priv *priv = netdev_priv(ndev);
+	int i;
+
+	hip04_reset_ppe(priv);
+	for (i = 0; i < RX_DESC_NUM; i++) {
+		dma_map_single(&ndev->dev, priv->rx_buf[i],
+				RX_BUF_SIZE, DMA_TO_DEVICE);
+		hip04_set_recv_desc(priv, virt_to_phys(priv->rx_buf[i]));
+	}
+	priv->rx_head = 0;
+	priv->tx_head = 0;
+	priv->tx_tail = 0;
+	priv->tx_count = 0;
+
+	if (priv->phy_node) {
+		priv->phy = of_phy_connect(ndev, priv->phy_node,
+			&hip04_adjust_link, 0, PHY_INTERFACE_MODE_GMII);
+		if (!priv->phy)
+			return -ENODEV;
+		phy_start(priv->phy);
+	}
+
+	netif_start_queue(ndev);
+	hip04_mac_enable(ndev, true);
+	napi_enable(&priv->napi);
+	return 0;
+}
+
+static int hip04_mac_stop(struct net_device *ndev)
+{
+	struct hip04_priv *priv = netdev_priv(ndev);
+
+	if (priv->phy)
+		phy_disconnect(priv->phy);
+	priv->phy = NULL;
+
+	napi_disable(&priv->napi);
+	netif_stop_queue(ndev);
+	hip04_mac_enable(ndev, false);
+	hip04_tx_reclaim(ndev, true);
+	hip04_reset_ppe(priv);
+	return 0;
+}
+
+static void hip04_timeout(struct net_device *ndev)
+{
+	netif_wake_queue(ndev);
+	return;
+}
+
+static struct net_device_ops hip04_netdev_ops = {
+	.ndo_open		= hip04_mac_open,
+	.ndo_stop		= hip04_mac_stop,
+	.ndo_start_xmit		= hip04_mac_start_xmit,
+	.ndo_set_mac_address	= hip04_set_mac_address,
+	.ndo_tx_timeout         = hip04_timeout,
+	.ndo_validate_addr	= eth_validate_addr,
+	.ndo_change_mtu		= eth_change_mtu,
+};
+
+static int hip04_alloc_ring(struct net_device *ndev, struct device *d)
+{
+	struct hip04_priv *priv = netdev_priv(ndev);
+	int i;
+
+	priv->rx_buf_size = RX_BUF_SIZE +
+			    SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
+
+	priv->desc_pool = dma_pool_create(DRV_NAME, d, sizeof(struct tx_desc),
+				SKB_DATA_ALIGN(sizeof(struct tx_desc)),	0);
+	if (!priv->desc_pool)
+		return -ENOMEM;
+
+	for (i = 0; i < TX_DESC_NUM; i++) {
+		priv->td_ring[i] = dma_pool_alloc(priv->desc_pool,
+					GFP_ATOMIC, &priv->td_phys[i]);
+		if (!priv->td_ring[i])
+			return -ENOMEM;
+	}
+
+	for (i = 0; i < RX_DESC_NUM; i++) {
+		priv->rx_buf[i] = netdev_alloc_frag(priv->rx_buf_size);
+		if (!priv->rx_buf[i])
+			return -ENOMEM;
+	}
+
+	return 0;
+}
+
+static void hip04_free_ring(struct net_device *ndev)
+{
+	struct hip04_priv *priv = netdev_priv(ndev);
+	int i;
+
+	for (i = 0; i < RX_DESC_NUM; i++)
+		if (priv->rx_buf[i])
+			put_page(virt_to_head_page(priv->rx_buf[i]));
+
+	for (i = 0; i < TX_DESC_NUM; i++) {
+		if (priv->tx_skb[i])
+			dev_kfree_skb_any(priv->tx_skb[i]);
+		if ((priv->desc_pool) && (priv->td_ring[i]))
+			dma_pool_free(priv->desc_pool, priv->td_ring[i],
+					priv->td_phys[i]);
+	}
+
+	if (priv->desc_pool)
+		dma_pool_destroy(priv->desc_pool);
+}
+
+static int hip04_mac_probe(struct platform_device *pdev)
+{
+	struct device *d = &pdev->dev;
+	struct device_node *node = d->of_node;
+	struct net_device *ndev;
+	struct hip04_priv *priv;
+	struct resource *res;
+	unsigned int irq, val;
+	int ret;
+
+	ndev = alloc_etherdev(sizeof(struct hip04_priv));
+	if (!ndev)
+		return -ENOMEM;
+
+	priv = netdev_priv(ndev);
+	priv->ndev = ndev;
+	platform_set_drvdata(pdev, ndev);
+	spin_lock_init(&priv->txlock);
+	res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
+	if (!res) {
+		ret = -EINVAL;
+		goto init_fail;
+	}
+	ndev->base_addr = res->start;
+	priv->base = devm_ioremap_resource(d, res);
+	ret = IS_ERR(priv->base);
+	if (ret) {
+		dev_err(d, "devm_ioremap_resource failed\n");
+		goto init_fail;
+	}
+
+	if (!ppebase) {
+		struct device_node *n;
+
+		n = of_find_compatible_node(NULL, NULL, "hisilicon,hip04-ppebase");
+		if (!n) {
+			ret = -EINVAL;
+			netdev_err(ndev, "not find hisilicon,ppebase\n");
+			goto init_fail;
+		}
+		ppebase = of_iomap(n, 0);
+	}
+
+	ret = of_property_read_u32(node, "port", &val);
+	if (ret) {
+		dev_warn(d, "not find port info\n");
+		goto init_fail;
+	}
+	priv->port = val & 0x1f;
+
+	ret = of_property_read_u32(node, "speed", &val);
+	if (ret) {
+		dev_warn(d, "not find speed info\n");
+		priv->speed = SPEED_1000;
+	}
+
+	if (SPEED_100 == val)
+		priv->speed = SPEED_100;
+	else
+		priv->speed = SPEED_1000;
+	priv->duplex = DUPLEX_FULL;
+
+	ret = of_property_read_u32(node, "id", &priv->id);
+	if (ret) {
+		dev_warn(d, "not find id info\n");
+		goto init_fail;
+	}
+	irq = platform_get_irq(pdev, 0);
+	if (irq <= 0) {
+		ret = -EINVAL;
+		goto init_fail;
+	}
+	ether_setup(ndev);
+	ndev->netdev_ops = &hip04_netdev_ops;
+	ndev->watchdog_timeo = TX_TIMEOUT;
+	ndev->priv_flags |= IFF_UNICAST_FLT;
+	ndev->irq = irq;
+	netif_napi_add(ndev, &priv->napi, hip04_rx_poll, RX_DESC_NUM);
+	SET_NETDEV_DEV(ndev, &pdev->dev);
+
+	hip04_reset_ppe(priv);
+	hip04_config_port(priv, priv->speed, priv->duplex);
+	hip04_config_fifo(priv);
+	random_ether_addr(ndev->dev_addr);
+	hip04_update_mac_address(ndev);
+
+	ret = hip04_alloc_ring(ndev, d);
+	if (ret) {
+		netdev_err(ndev, "alloc ring fail\n");
+		goto alloc_fail;
+	}
+
+	ret = devm_request_irq(d, irq, hip04_mac_interrupt,
+					0, pdev->name, ndev);
+	if (ret) {
+		netdev_err(ndev, "devm_request_irq failed\n");
+		goto alloc_fail;
+	}
+
+	priv->phy_node = of_parse_phandle(node, "phy-handle", 0);
+
+	ret = register_netdev(ndev);
+	if (ret) {
+		free_netdev(ndev);
+		goto alloc_fail;
+	}
+
+	return 0;
+alloc_fail:
+	hip04_free_ring(ndev);
+init_fail:
+	of_node_put(priv->phy_node);
+	free_netdev(ndev);
+	return ret;
+}
+
+static int hip04_remove(struct platform_device *pdev)
+{
+	struct net_device *ndev = platform_get_drvdata(pdev);
+	struct hip04_priv *priv = netdev_priv(ndev);
+
+	hip04_free_ring(ndev);
+	unregister_netdev(ndev);
+	free_irq(ndev->irq, ndev);
+	of_node_put(priv->phy_node);
+	free_netdev(ndev);
+
+	return 0;
+}
+
+static const struct of_device_id hip04_mac_match[] = {
+	{ .compatible = "hisilicon,hip04-mac" },
+	{ }
+};
+
+static struct platform_driver hip04_mac_driver = {
+	.probe	= hip04_mac_probe,
+	.remove	= hip04_remove,
+	.driver	= {
+		.name		= DRV_NAME,
+		.owner		= THIS_MODULE,
+		.of_match_table	= hip04_mac_match,
+	},
+};
+module_platform_driver(hip04_mac_driver);
+
+MODULE_DESCRIPTION("HISILICON P04 Ethernet driver");
+MODULE_LICENSE("GPL v2");
+MODULE_ALIAS("platform:hip04-ether");
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 148+ messages in thread

* [PATCH 3/3] net: hisilicon: new hip04 ethernet driver
@ 2014-03-18  8:40     ` Zhangfei Gao
  0 siblings, 0 replies; 148+ messages in thread
From: Zhangfei Gao @ 2014-03-18  8:40 UTC (permalink / raw)
  To: linux-arm-kernel

Support Hisilicon hip04 ethernet driver, including 100M / 1000M controller

Signed-off-by: Zhangfei Gao <zhangfei.gao@linaro.org>
---
 drivers/net/ethernet/hisilicon/Makefile    |    2 +-
 drivers/net/ethernet/hisilicon/hip04_eth.c |  717 ++++++++++++++++++++++++++++
 2 files changed, 718 insertions(+), 1 deletion(-)
 create mode 100644 drivers/net/ethernet/hisilicon/hip04_eth.c

diff --git a/drivers/net/ethernet/hisilicon/Makefile b/drivers/net/ethernet/hisilicon/Makefile
index 1d6eb6e..e6fe7af 100644
--- a/drivers/net/ethernet/hisilicon/Makefile
+++ b/drivers/net/ethernet/hisilicon/Makefile
@@ -2,4 +2,4 @@
 # Makefile for the HISILICON network device drivers.
 #
 
-obj-$(CONFIG_HIP04_ETH) += hip04_mdio.o
+obj-$(CONFIG_HIP04_ETH) += hip04_eth.o hip04_mdio.o
diff --git a/drivers/net/ethernet/hisilicon/hip04_eth.c b/drivers/net/ethernet/hisilicon/hip04_eth.c
new file mode 100644
index 0000000..b12e0df
--- /dev/null
+++ b/drivers/net/ethernet/hisilicon/hip04_eth.c
@@ -0,0 +1,717 @@
+
+/* Copyright (c) 2014 Linaro Ltd.
+ * Copyright (c) 2014 Hisilicon Limited.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ */
+
+#include <linux/module.h>
+#include <linux/etherdevice.h>
+#include <linux/platform_device.h>
+#include <linux/interrupt.h>
+#include <linux/of_address.h>
+#include <linux/dmapool.h>
+#include <linux/phy.h>
+#include <linux/of_mdio.h>
+
+#define PPE_CFG_RX_CFF_ADDR		0x100
+#define PPE_CFG_POOL_GRP		0x300
+#define PPE_CFG_RX_BUF_SIZE		0x400
+#define PPE_CFG_RX_FIFO_SIZE		0x500
+#define PPE_CURR_BUF_CNT_REG		0xa200
+
+#define GE_DUPLEX_TYPE			0x8
+#define GE_MAX_FRM_SIZE_REG		0x3c
+#define GE_PORT_MODE			0x40
+#define GE_PORT_EN			0x44
+#define GE_SHORT_RUNTS_THR_REG		0x50
+#define GE_TX_LOCAL_PAGE_REG		0x5c
+#define GE_TRANSMIT_CONTROL_REG		0x60
+#define GE_CF_CRC_STRIP_REG		0x1b0
+#define GE_MODE_CHANGE_EN		0x1b4
+#define GE_RECV_CONTROL_REG		0x1e0
+#define GE_STATION_MAC_ADDRESS		0x210
+#define PPE_CFG_TX_PKT_BD_ADDR		0x420
+#define PPE_CFG_MAX_FRAME_LEN_REG	0x408
+#define PPE_CFG_BUS_CTRL_REG		0x424
+#define PPE_CFG_RX_CTRL_REG		0x428
+#define PPE_CFG_RX_PKT_MODE_REG		0x438
+#define PPE_CFG_QOS_VMID_GEN		0x500
+#define PPE_CFG_RX_PKT_INT		0x538
+#define PPE_INTEN			0x600
+#define PPE_INTSTS			0x608
+#define PPE_RINT			0x604
+#define PPE_CFG_STS_MODE		0x700
+#define PPE_HIS_RX_PKT_CNT		0x804
+
+/* REG_INTERRUPT */
+#define RCV_INT				BIT(10)
+#define RCV_NOBUF			BIT(8)
+#define DEF_INT_MASK			0x41fdf
+
+#define RX_DESC_NUM			64
+#define TX_DESC_NUM			64
+#define TX_NEXT(N)			(((N) + 1) & (TX_DESC_NUM-1))
+#define RX_NEXT(N)			(((N) + 1) & (RX_DESC_NUM-1))
+
+#define GMAC_PPE_RX_PKT_MAX_LEN		379
+#define GMAC_MAX_PKT_LEN		1516
+#define DESC_DEF_CFG			0x14
+#define RX_BUF_SIZE			1600
+#define TX_TIMEOUT			(6 * HZ)
+
+#define DRV_NAME			"hip04-ether"
+
+struct tx_desc {
+	u32 send_addr;
+	u16 send_size;
+	u16 reserved[3];
+	u32 cfg;
+	u32 wb_addr;
+};
+
+struct rx_desc {
+	u16 pkt_len;
+	u16 reserved_16;
+	u32 reserve[8];
+};
+
+struct hip04_priv {
+	void __iomem *base;
+	unsigned int port;
+	unsigned int speed;
+	unsigned int duplex;
+	unsigned int id;
+	unsigned int reg_inten;
+
+	struct napi_struct napi;
+	struct net_device *ndev;
+	struct dma_pool *desc_pool;
+
+	struct sk_buff *tx_skb[TX_DESC_NUM];
+	struct tx_desc *td_ring[TX_DESC_NUM];
+	dma_addr_t td_phys[TX_DESC_NUM];
+	spinlock_t txlock;
+	unsigned int tx_head;
+	unsigned int tx_tail;
+	unsigned int tx_count;
+
+	unsigned char *rx_buf[RX_DESC_NUM];
+	unsigned int rx_head;
+	unsigned int rx_buf_size;
+
+	struct device_node *phy_node;
+	struct phy_device *phy;
+};
+
+static void __iomem *ppebase;
+
+static void hip04_config_port(struct hip04_priv *priv, u32 speed, u32 duplex)
+{
+	u32 val;
+
+	priv->speed = speed;
+	priv->duplex = duplex;
+
+	switch (speed) {
+	case SPEED_1000:
+		val = 8;
+		break;
+	case SPEED_100:
+		if (priv->id)
+			val = 7;
+		else
+			val = 1;
+		break;
+	default:
+		val = 0;
+		break;
+	}
+	writel_relaxed(val, priv->base + GE_PORT_MODE);
+
+	val = (duplex) ? BIT(0) : 0;
+	writel_relaxed(val, priv->base + GE_DUPLEX_TYPE);
+
+	val = BIT(0);
+	writel_relaxed(val, priv->base + GE_MODE_CHANGE_EN);
+}
+
+static void hip04_reset_ppe(struct hip04_priv *priv)
+{
+	u32 val;
+
+	do {
+		val =
+		readl_relaxed(ppebase + priv->port * 4 + PPE_CURR_BUF_CNT_REG);
+		readl_relaxed(ppebase + priv->port * 4 + PPE_CFG_RX_CFF_ADDR);
+	} while (val & 0xfff);
+}
+
+static void hip04_config_fifo(struct hip04_priv *priv)
+{
+	u32 val;
+
+	val = readl_relaxed(priv->base + PPE_CFG_STS_MODE);
+	val |= BIT(12);			/* PPE_HIS_RX_PKT_CNT read clear */
+	writel_relaxed(val, priv->base + PPE_CFG_STS_MODE);
+
+	val = BIT(priv->port);
+	writel_relaxed(val, ppebase + priv->port * 4 + PPE_CFG_POOL_GRP);
+
+	val = priv->port << 8;
+	val |= BIT(14);
+	writel_relaxed(val, priv->base + PPE_CFG_QOS_VMID_GEN);
+
+	val = RX_BUF_SIZE;
+	writel_relaxed(val, ppebase + priv->port * 4 + PPE_CFG_RX_BUF_SIZE);
+
+	val = RX_DESC_NUM << 16;	/* depth */
+	val |= BIT(11);			/* seq: first set first ues */
+	val |= RX_DESC_NUM * priv->id;	/* start_addr */
+	writel_relaxed(val, ppebase + priv->port * 4 + PPE_CFG_RX_FIFO_SIZE);
+
+	/* pkt store format */
+	val = NET_IP_ALIGN << 11;	/* align */
+	writel_relaxed(val, priv->base + PPE_CFG_RX_CTRL_REG);
+
+	/* following cfg required for 1000M */
+	/* pkt mode */
+	val = BIT(18);			/* align */
+	writel_relaxed(val, priv->base + PPE_CFG_RX_PKT_MODE_REG);
+
+	/* set bus ctrl */
+	val = BIT(14);			/* buffer locally release */
+	val |= BIT(0);			/* big endian */
+	writel_relaxed(val, priv->base + PPE_CFG_BUS_CTRL_REG);
+
+	/* set max pkt len, curtail if exceed */
+	val = GMAC_PPE_RX_PKT_MAX_LEN;	/* max buffer len */
+	writel_relaxed(val, priv->base + PPE_CFG_MAX_FRAME_LEN_REG);
+
+	/* set max len of each pkt */
+	val = GMAC_MAX_PKT_LEN;		/* max buffer len */
+	writel_relaxed(val, priv->base + GE_MAX_FRM_SIZE_REG);
+
+	/* set min len of each pkt */
+	val = 31;			/* min buffer len */
+	writel_relaxed(val, priv->base + GE_SHORT_RUNTS_THR_REG);
+
+	/* tx */
+	val = readl_relaxed(priv->base + GE_TRANSMIT_CONTROL_REG);
+	val |= BIT(5);			/* tx auto neg */
+	val |= BIT(6);			/* tx add crc */
+	val |= BIT(7);			/* tx short pad through */
+	writel_relaxed(val, priv->base + GE_TRANSMIT_CONTROL_REG);
+
+	/* rx crc */
+	val = BIT(0);			/* rx strip crc */
+	writel_relaxed(val, priv->base + GE_CF_CRC_STRIP_REG);
+
+	/* rx */
+	val = readl_relaxed(priv->base + GE_RECV_CONTROL_REG);
+	val |= BIT(3);			/* rx strip pad */
+	val |= BIT(4);			/* run pkt en */
+	writel_relaxed(val, priv->base + GE_RECV_CONTROL_REG);
+
+	/* auto neg control */
+	val = BIT(0);
+	writel_relaxed(val, priv->base + GE_TX_LOCAL_PAGE_REG);
+}
+
+static void hip04_mac_enable(struct net_device *ndev, bool enable)
+{
+	struct hip04_priv *priv = netdev_priv(ndev);
+	u32 val;
+
+	if (enable) {
+		/* enable tx & rx */
+		val = readl_relaxed(priv->base + GE_PORT_EN);
+		val |= BIT(1);		/* rx*/
+		val |= BIT(2);		/* tx*/
+		writel_relaxed(val, priv->base + GE_PORT_EN);
+
+		/* enable interrupt */
+		priv->reg_inten = DEF_INT_MASK;
+		writel_relaxed(priv->reg_inten, priv->base + PPE_INTEN);
+
+		/* clear rx int */
+		val = RCV_INT;
+		writel_relaxed(val, priv->base + PPE_RINT);
+
+		/* config recv int*/
+		val = BIT(6);		/* int threshold 1 package */
+		val |= 0x4;		/* recv timeout */
+		writel_relaxed(val, priv->base + PPE_CFG_RX_PKT_INT);
+	} else {
+		/* disable int */
+		priv->reg_inten &= ~(RCV_INT | RCV_NOBUF);
+		writel_relaxed(priv->reg_inten, priv->base + PPE_INTEN);
+
+		/* disable tx & rx */
+		val = readl_relaxed(priv->base + GE_PORT_EN);
+		val &= ~(BIT(1));	/* rx*/
+		val &= ~(BIT(2));	/* tx*/
+		writel_relaxed(val, priv->base + GE_PORT_EN);
+	}
+}
+
+static void hip04_set_xmit_desc(struct hip04_priv *priv, dma_addr_t phys)
+{
+	writel_relaxed(phys, priv->base + PPE_CFG_TX_PKT_BD_ADDR);
+}
+
+static void hip04_set_recv_desc(struct hip04_priv *priv, dma_addr_t phys)
+{
+	writel_relaxed(phys, ppebase + priv->port * 4 + PPE_CFG_RX_CFF_ADDR);
+}
+
+static u32 hip04_recv_cnt(struct hip04_priv *priv)
+{
+	return readl_relaxed(priv->base + PPE_HIS_RX_PKT_CNT);
+}
+
+static void hip04_update_mac_address(struct net_device *ndev)
+{
+	struct hip04_priv *priv = netdev_priv(ndev);
+
+	writel_relaxed(((ndev->dev_addr[0] << 8) | (ndev->dev_addr[1])),
+			priv->base + GE_STATION_MAC_ADDRESS);
+	writel_relaxed(((ndev->dev_addr[2] << 24) | (ndev->dev_addr[3] << 16) |
+			(ndev->dev_addr[4] << 8) | (ndev->dev_addr[5])),
+			priv->base + GE_STATION_MAC_ADDRESS + 4);
+}
+
+static int hip04_set_mac_address(struct net_device *ndev, void *addr)
+{
+	eth_mac_addr(ndev, addr);
+	hip04_update_mac_address(ndev);
+	return 0;
+}
+
+static void endian_change(void *p, int size)
+{
+	unsigned int *to_cover = (unsigned int *)p;
+	int i;
+
+	size = size >> 2;
+	for (i = 0; i < size; i++)
+		*(to_cover+i) = htonl(*(to_cover+i));
+}
+
+static int hip04_rx_poll(struct napi_struct *napi, int budget)
+{
+	struct hip04_priv *priv = container_of(napi,
+			      struct hip04_priv, napi);
+	struct net_device *ndev = priv->ndev;
+	struct sk_buff *skb;
+	struct rx_desc *desc;
+	unsigned char *buf;
+	int rx = 0;
+	unsigned int cnt = hip04_recv_cnt(priv);
+	unsigned int len, tmp[16];
+
+	while (cnt) {
+		buf = priv->rx_buf[priv->rx_head];
+		skb = build_skb(buf, priv->rx_buf_size);
+		if (unlikely(!skb))
+			net_dbg_ratelimited("build_skb failed\n");
+		dma_map_single(&ndev->dev, skb->data,
+			RX_BUF_SIZE, DMA_FROM_DEVICE);
+		memcpy(tmp, skb->data, 64);
+		endian_change((void *)tmp, 64);
+		desc = (struct rx_desc *)tmp;
+		len = desc->pkt_len;
+
+		if (len > RX_BUF_SIZE)
+			len = RX_BUF_SIZE;
+		if (0 == len)
+			break;
+
+		skb_reserve(skb, NET_SKB_PAD + NET_IP_ALIGN);
+		skb_put(skb, len);
+		skb->protocol = eth_type_trans(skb, ndev);
+		napi_gro_receive(&priv->napi, skb);
+
+		buf = netdev_alloc_frag(priv->rx_buf_size);
+		if (!buf)
+			return -ENOMEM;
+		priv->rx_buf[priv->rx_head] = buf;
+		dma_map_single(&ndev->dev, buf, RX_BUF_SIZE, DMA_TO_DEVICE);
+		hip04_set_recv_desc(priv, virt_to_phys(buf));
+
+		priv->rx_head = RX_NEXT(priv->rx_head);
+		if (rx++ >= budget)
+			break;
+
+		if (--cnt == 0)
+			cnt = hip04_recv_cnt(priv);
+	}
+
+	if (rx < budget) {
+		napi_gro_flush(napi, false);
+		__napi_complete(napi);
+	}
+
+	/* enable rx interrupt */
+	priv->reg_inten |= RCV_INT | RCV_NOBUF;
+	writel_relaxed(priv->reg_inten, priv->base + PPE_INTEN);
+
+	return rx;
+}
+
+static irqreturn_t hip04_mac_interrupt(int irq, void *dev_id)
+{
+	struct net_device *ndev = (struct net_device *) dev_id;
+	struct hip04_priv *priv = netdev_priv(ndev);
+	u32 ists = readl_relaxed(priv->base + PPE_INTSTS);
+	u32 val = DEF_INT_MASK;
+
+	writel_relaxed(val, priv->base + PPE_RINT);
+
+	if ((ists & RCV_INT) || (ists & RCV_NOBUF)) {
+		if (napi_schedule_prep(&priv->napi)) {
+			/* disable rx interrupt */
+			priv->reg_inten &= ~(RCV_INT | RCV_NOBUF);
+			writel_relaxed(priv->reg_inten, priv->base + PPE_INTEN);
+			__napi_schedule(&priv->napi);
+		}
+	}
+
+	return IRQ_HANDLED;
+}
+
+static void hip04_tx_reclaim(struct net_device *ndev, bool force)
+{
+	struct hip04_priv *priv = netdev_priv(ndev);
+	unsigned tx_head = priv->tx_head;
+	unsigned tx_tail = priv->tx_tail;
+	struct tx_desc *desc = priv->td_ring[priv->tx_tail];
+
+	spin_lock_irq(&priv->txlock);
+	while (tx_tail != tx_head) {
+		if (desc->send_addr != 0) {
+			if (force)
+				desc->send_addr = 0;
+			else
+				break;
+		}
+		dev_kfree_skb_irq(priv->tx_skb[tx_tail]);
+		priv->tx_skb[tx_tail] = NULL;
+		tx_tail = TX_NEXT(tx_tail);
+		priv->tx_count--;
+	}
+	priv->tx_tail = tx_tail;
+	spin_unlock_irq(&priv->txlock);
+}
+
+static int hip04_mac_start_xmit(struct sk_buff *skb, struct net_device *ndev)
+{
+	struct hip04_priv *priv = netdev_priv(ndev);
+	struct tx_desc *desc = priv->td_ring[priv->tx_head];
+	unsigned int tx_head = priv->tx_head;
+	int ret;
+
+	hip04_tx_reclaim(ndev, false);
+
+	spin_lock_irq(&priv->txlock);
+	if (priv->tx_count++ >= TX_DESC_NUM) {
+		net_dbg_ratelimited("no TX space for packet\n");
+		netif_stop_queue(ndev);
+		ret = NETDEV_TX_BUSY;
+		goto out_unlock;
+	}
+
+	priv->tx_skb[tx_head] = skb;
+	dma_map_single(&ndev->dev, skb->data, skb->len, DMA_TO_DEVICE);
+	memset((void *)desc, 0, sizeof(*desc));
+	desc->send_addr = (unsigned int)virt_to_phys(skb->data);
+	desc->send_size = skb->len;
+	desc->cfg = DESC_DEF_CFG;
+	desc->wb_addr = priv->td_phys[tx_head];
+	endian_change(desc, 64);
+	skb_tx_timestamp(skb);
+	hip04_set_xmit_desc(priv, priv->td_phys[tx_head]);
+
+	priv->tx_head = TX_NEXT(tx_head);
+	ret = NETDEV_TX_OK;
+out_unlock:
+	spin_unlock_irq(&priv->txlock);
+
+	return ret;
+}
+
+static void hip04_adjust_link(struct net_device *ndev)
+{
+	struct hip04_priv *priv = netdev_priv(ndev);
+	struct phy_device *phy = priv->phy;
+
+	if ((priv->speed != phy->speed) || (priv->duplex != phy->duplex)) {
+		hip04_config_port(priv, phy->speed, phy->duplex);
+		phy_print_status(phy);
+	}
+}
+
+static int hip04_mac_open(struct net_device *ndev)
+{
+	struct hip04_priv *priv = netdev_priv(ndev);
+	int i;
+
+	hip04_reset_ppe(priv);
+	for (i = 0; i < RX_DESC_NUM; i++) {
+		dma_map_single(&ndev->dev, priv->rx_buf[i],
+				RX_BUF_SIZE, DMA_TO_DEVICE);
+		hip04_set_recv_desc(priv, virt_to_phys(priv->rx_buf[i]));
+	}
+	priv->rx_head = 0;
+	priv->tx_head = 0;
+	priv->tx_tail = 0;
+	priv->tx_count = 0;
+
+	if (priv->phy_node) {
+		priv->phy = of_phy_connect(ndev, priv->phy_node,
+			&hip04_adjust_link, 0, PHY_INTERFACE_MODE_GMII);
+		if (!priv->phy)
+			return -ENODEV;
+		phy_start(priv->phy);
+	}
+
+	netif_start_queue(ndev);
+	hip04_mac_enable(ndev, true);
+	napi_enable(&priv->napi);
+	return 0;
+}
+
+static int hip04_mac_stop(struct net_device *ndev)
+{
+	struct hip04_priv *priv = netdev_priv(ndev);
+
+	if (priv->phy)
+		phy_disconnect(priv->phy);
+	priv->phy = NULL;
+
+	napi_disable(&priv->napi);
+	netif_stop_queue(ndev);
+	hip04_mac_enable(ndev, false);
+	hip04_tx_reclaim(ndev, true);
+	hip04_reset_ppe(priv);
+	return 0;
+}
+
+static void hip04_timeout(struct net_device *ndev)
+{
+	netif_wake_queue(ndev);
+	return;
+}
+
+static struct net_device_ops hip04_netdev_ops = {
+	.ndo_open		= hip04_mac_open,
+	.ndo_stop		= hip04_mac_stop,
+	.ndo_start_xmit		= hip04_mac_start_xmit,
+	.ndo_set_mac_address	= hip04_set_mac_address,
+	.ndo_tx_timeout         = hip04_timeout,
+	.ndo_validate_addr	= eth_validate_addr,
+	.ndo_change_mtu		= eth_change_mtu,
+};
+
+static int hip04_alloc_ring(struct net_device *ndev, struct device *d)
+{
+	struct hip04_priv *priv = netdev_priv(ndev);
+	int i;
+
+	priv->rx_buf_size = RX_BUF_SIZE +
+			    SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
+
+	priv->desc_pool = dma_pool_create(DRV_NAME, d, sizeof(struct tx_desc),
+				SKB_DATA_ALIGN(sizeof(struct tx_desc)),	0);
+	if (!priv->desc_pool)
+		return -ENOMEM;
+
+	for (i = 0; i < TX_DESC_NUM; i++) {
+		priv->td_ring[i] = dma_pool_alloc(priv->desc_pool,
+					GFP_ATOMIC, &priv->td_phys[i]);
+		if (!priv->td_ring[i])
+			return -ENOMEM;
+	}
+
+	for (i = 0; i < RX_DESC_NUM; i++) {
+		priv->rx_buf[i] = netdev_alloc_frag(priv->rx_buf_size);
+		if (!priv->rx_buf[i])
+			return -ENOMEM;
+	}
+
+	return 0;
+}
+
+static void hip04_free_ring(struct net_device *ndev)
+{
+	struct hip04_priv *priv = netdev_priv(ndev);
+	int i;
+
+	for (i = 0; i < RX_DESC_NUM; i++)
+		if (priv->rx_buf[i])
+			put_page(virt_to_head_page(priv->rx_buf[i]));
+
+	for (i = 0; i < TX_DESC_NUM; i++) {
+		if (priv->tx_skb[i])
+			dev_kfree_skb_any(priv->tx_skb[i]);
+		if ((priv->desc_pool) && (priv->td_ring[i]))
+			dma_pool_free(priv->desc_pool, priv->td_ring[i],
+					priv->td_phys[i]);
+	}
+
+	if (priv->desc_pool)
+		dma_pool_destroy(priv->desc_pool);
+}
+
+static int hip04_mac_probe(struct platform_device *pdev)
+{
+	struct device *d = &pdev->dev;
+	struct device_node *node = d->of_node;
+	struct net_device *ndev;
+	struct hip04_priv *priv;
+	struct resource *res;
+	unsigned int irq, val;
+	int ret;
+
+	ndev = alloc_etherdev(sizeof(struct hip04_priv));
+	if (!ndev)
+		return -ENOMEM;
+
+	priv = netdev_priv(ndev);
+	priv->ndev = ndev;
+	platform_set_drvdata(pdev, ndev);
+	spin_lock_init(&priv->txlock);
+	res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
+	if (!res) {
+		ret = -EINVAL;
+		goto init_fail;
+	}
+	ndev->base_addr = res->start;
+	priv->base = devm_ioremap_resource(d, res);
+	ret = IS_ERR(priv->base);
+	if (ret) {
+		dev_err(d, "devm_ioremap_resource failed\n");
+		goto init_fail;
+	}
+
+	if (!ppebase) {
+		struct device_node *n;
+
+		n = of_find_compatible_node(NULL, NULL, "hisilicon,hip04-ppebase");
+		if (!n) {
+			ret = -EINVAL;
+			netdev_err(ndev, "not find hisilicon,ppebase\n");
+			goto init_fail;
+		}
+		ppebase = of_iomap(n, 0);
+	}
+
+	ret = of_property_read_u32(node, "port", &val);
+	if (ret) {
+		dev_warn(d, "not find port info\n");
+		goto init_fail;
+	}
+	priv->port = val & 0x1f;
+
+	ret = of_property_read_u32(node, "speed", &val);
+	if (ret) {
+		dev_warn(d, "not find speed info\n");
+		priv->speed = SPEED_1000;
+	}
+
+	if (SPEED_100 == val)
+		priv->speed = SPEED_100;
+	else
+		priv->speed = SPEED_1000;
+	priv->duplex = DUPLEX_FULL;
+
+	ret = of_property_read_u32(node, "id", &priv->id);
+	if (ret) {
+		dev_warn(d, "not find id info\n");
+		goto init_fail;
+	}
+	irq = platform_get_irq(pdev, 0);
+	if (irq <= 0) {
+		ret = -EINVAL;
+		goto init_fail;
+	}
+	ether_setup(ndev);
+	ndev->netdev_ops = &hip04_netdev_ops;
+	ndev->watchdog_timeo = TX_TIMEOUT;
+	ndev->priv_flags |= IFF_UNICAST_FLT;
+	ndev->irq = irq;
+	netif_napi_add(ndev, &priv->napi, hip04_rx_poll, RX_DESC_NUM);
+	SET_NETDEV_DEV(ndev, &pdev->dev);
+
+	hip04_reset_ppe(priv);
+	hip04_config_port(priv, priv->speed, priv->duplex);
+	hip04_config_fifo(priv);
+	random_ether_addr(ndev->dev_addr);
+	hip04_update_mac_address(ndev);
+
+	ret = hip04_alloc_ring(ndev, d);
+	if (ret) {
+		netdev_err(ndev, "alloc ring fail\n");
+		goto alloc_fail;
+	}
+
+	ret = devm_request_irq(d, irq, hip04_mac_interrupt,
+					0, pdev->name, ndev);
+	if (ret) {
+		netdev_err(ndev, "devm_request_irq failed\n");
+		goto alloc_fail;
+	}
+
+	priv->phy_node = of_parse_phandle(node, "phy-handle", 0);
+
+	ret = register_netdev(ndev);
+	if (ret) {
+		free_netdev(ndev);
+		goto alloc_fail;
+	}
+
+	return 0;
+alloc_fail:
+	hip04_free_ring(ndev);
+init_fail:
+	of_node_put(priv->phy_node);
+	free_netdev(ndev);
+	return ret;
+}
+
+static int hip04_remove(struct platform_device *pdev)
+{
+	struct net_device *ndev = platform_get_drvdata(pdev);
+	struct hip04_priv *priv = netdev_priv(ndev);
+
+	hip04_free_ring(ndev);
+	unregister_netdev(ndev);
+	free_irq(ndev->irq, ndev);
+	of_node_put(priv->phy_node);
+	free_netdev(ndev);
+
+	return 0;
+}
+
+static const struct of_device_id hip04_mac_match[] = {
+	{ .compatible = "hisilicon,hip04-mac" },
+	{ }
+};
+
+static struct platform_driver hip04_mac_driver = {
+	.probe	= hip04_mac_probe,
+	.remove	= hip04_remove,
+	.driver	= {
+		.name		= DRV_NAME,
+		.owner		= THIS_MODULE,
+		.of_match_table	= hip04_mac_match,
+	},
+};
+module_platform_driver(hip04_mac_driver);
+
+MODULE_DESCRIPTION("HISILICON P04 Ethernet driver");
+MODULE_LICENSE("GPL v2");
+MODULE_ALIAS("platform:hip04-ether");
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 148+ messages in thread

* Re: [PATCH 0/3] add hisilicon hip04 ethernet driver
  2014-03-18  8:40 ` Zhangfei Gao
@ 2014-03-18 10:27   ` Ding Tianhong
  -1 siblings, 0 replies; 148+ messages in thread
From: Ding Tianhong @ 2014-03-18 10:27 UTC (permalink / raw)
  To: Zhangfei Gao, David S. Miller; +Cc: netdev, linux-arm-kernel, devicetree

On 2014/3/18 16:40, Zhangfei Gao wrote:
> Zhangfei Gao (3):
>   Documentation: add Device tree bindings for Hisilicon hip04 ethernet
>   net: hisilicon: new hip04 MDIO driver
>   net: hisilicon: new hip04 ethernet driver
> 
>  .../bindings/net/hisilicon-hip04-net.txt           |   74 ++
>  drivers/net/ethernet/Kconfig                       |    1 +
>  drivers/net/ethernet/Makefile                      |    1 +
>  drivers/net/ethernet/hisilicon/Kconfig             |   31 +
>  drivers/net/ethernet/hisilicon/Makefile            |    5 +
>  drivers/net/ethernet/hisilicon/hip04_eth.c         |  717 ++++++++++++++++++++
>  drivers/net/ethernet/hisilicon/hip04_mdio.c        |  190 ++++++
>  7 files changed, 1019 insertions(+)
>  create mode 100644 Documentation/devicetree/bindings/net/hisilicon-hip04-net.txt
>  create mode 100644 drivers/net/ethernet/hisilicon/Kconfig
>  create mode 100644 drivers/net/ethernet/hisilicon/Makefile
>  create mode 100644 drivers/net/ethernet/hisilicon/hip04_eth.c
>  create mode 100644 drivers/net/ethernet/hisilicon/hip04_mdio.c
> 

Great work, thanks for upstream the driver.

Reviewed-by: Ding Tianhong <dingtianhong@huawei.com>

^ permalink raw reply	[flat|nested] 148+ messages in thread

* [PATCH 0/3] add hisilicon hip04 ethernet driver
@ 2014-03-18 10:27   ` Ding Tianhong
  0 siblings, 0 replies; 148+ messages in thread
From: Ding Tianhong @ 2014-03-18 10:27 UTC (permalink / raw)
  To: linux-arm-kernel

On 2014/3/18 16:40, Zhangfei Gao wrote:
> Zhangfei Gao (3):
>   Documentation: add Device tree bindings for Hisilicon hip04 ethernet
>   net: hisilicon: new hip04 MDIO driver
>   net: hisilicon: new hip04 ethernet driver
> 
>  .../bindings/net/hisilicon-hip04-net.txt           |   74 ++
>  drivers/net/ethernet/Kconfig                       |    1 +
>  drivers/net/ethernet/Makefile                      |    1 +
>  drivers/net/ethernet/hisilicon/Kconfig             |   31 +
>  drivers/net/ethernet/hisilicon/Makefile            |    5 +
>  drivers/net/ethernet/hisilicon/hip04_eth.c         |  717 ++++++++++++++++++++
>  drivers/net/ethernet/hisilicon/hip04_mdio.c        |  190 ++++++
>  7 files changed, 1019 insertions(+)
>  create mode 100644 Documentation/devicetree/bindings/net/hisilicon-hip04-net.txt
>  create mode 100644 drivers/net/ethernet/hisilicon/Kconfig
>  create mode 100644 drivers/net/ethernet/hisilicon/Makefile
>  create mode 100644 drivers/net/ethernet/hisilicon/hip04_eth.c
>  create mode 100644 drivers/net/ethernet/hisilicon/hip04_mdio.c
> 

Great work, thanks for upstream the driver.

Reviewed-by: Ding Tianhong <dingtianhong@huawei.com>

^ permalink raw reply	[flat|nested] 148+ messages in thread

* Re: [PATCH 3/3] net: hisilicon: new hip04 ethernet driver
  2014-03-18  8:40     ` Zhangfei Gao
@ 2014-03-18 10:46         ` Russell King - ARM Linux
  -1 siblings, 0 replies; 148+ messages in thread
From: Russell King - ARM Linux @ 2014-03-18 10:46 UTC (permalink / raw)
  To: Zhangfei Gao
  Cc: David S. Miller, netdev-u79uwXL29TY76Z2rM5mHXA,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	devicetree-u79uwXL29TY76Z2rM5mHXA

I was just browsing this patch when I noticed some of these issues - I
haven't done a full review of this driver, I'm just commenting on the
things I've spotted.

On Tue, Mar 18, 2014 at 04:40:17PM +0800, Zhangfei Gao wrote:
> +static int hip04_rx_poll(struct napi_struct *napi, int budget)
> +{
> +	struct hip04_priv *priv = container_of(napi,
> +			      struct hip04_priv, napi);
> +	struct net_device *ndev = priv->ndev;
> +	struct sk_buff *skb;
> +	struct rx_desc *desc;
> +	unsigned char *buf;
> +	int rx = 0;
> +	unsigned int cnt = hip04_recv_cnt(priv);
> +	unsigned int len, tmp[16];
> +
> +	while (cnt) {
> +		buf = priv->rx_buf[priv->rx_head];
> +		skb = build_skb(buf, priv->rx_buf_size);
> +		if (unlikely(!skb))
> +			net_dbg_ratelimited("build_skb failed\n");
> +		dma_map_single(&ndev->dev, skb->data,
> +			RX_BUF_SIZE, DMA_FROM_DEVICE);

This is incorrect.

buf = buffer alloc()
/* CPU owns buffer and can read/write it, device does not */
dev_addr = dma_map_single(dev, buf, ..., DMA_FROM_DEVICE);
/* Device owns buffer and can write it, CPU does not access it */
dma_unmap_single(dev, dev_addr, ..., DMA_FROM_DEVICE);
/* CPU owns buffer again and can read/write it, device does not */

Please turn on DMA API debugging in the kernel debug options and verify
whether your driver causes it to complain (it will.)

I think you want dma_unmap_single() here.

> +		memcpy(tmp, skb->data, 64);
> +		endian_change((void *)tmp, 64);
> +		desc = (struct rx_desc *)tmp;
> +		len = desc->pkt_len;

This is a rather expensive way to do this.  Presumably the descriptors
are always big endian?  If so, why not:

		desc = skb->data;
		len = be16_to_cpu(desc->pkt_len);

?  You may need to lay the struct out differently for this to work so
the offset which pkt_len accesses is correct.

Also... do you not have any flags which indicate whether the packet
received was in error?

> +
> +		if (len > RX_BUF_SIZE)
> +			len = RX_BUF_SIZE;
> +		if (0 == len)
> +			break;
> +
> +		skb_reserve(skb, NET_SKB_PAD + NET_IP_ALIGN);
> +		skb_put(skb, len);
> +		skb->protocol = eth_type_trans(skb, ndev);
> +		napi_gro_receive(&priv->napi, skb);
> +
> +		buf = netdev_alloc_frag(priv->rx_buf_size);
> +		if (!buf)
> +			return -ENOMEM;
> +		priv->rx_buf[priv->rx_head] = buf;
> +		dma_map_single(&ndev->dev, buf, RX_BUF_SIZE, DMA_TO_DEVICE);
> +		hip04_set_recv_desc(priv, virt_to_phys(buf));

No need for virt_to_phys() here - dma_map_single() returns the device
address.

> +
> +		priv->rx_head = RX_NEXT(priv->rx_head);
> +		if (rx++ >= budget)
> +			break;
> +
> +		if (--cnt == 0)
> +			cnt = hip04_recv_cnt(priv);
> +	}
> +
> +	if (rx < budget) {
> +		napi_gro_flush(napi, false);
> +		__napi_complete(napi);
> +	}
> +
> +	/* enable rx interrupt */
> +	priv->reg_inten |= RCV_INT | RCV_NOBUF;
> +	writel_relaxed(priv->reg_inten, priv->base + PPE_INTEN);

This doesn't look right - you're supposed to re-enable receive interrupts
when you receive less than "budget" packets.

> +static irqreturn_t hip04_mac_interrupt(int irq, void *dev_id)
> +{
> +	struct net_device *ndev = (struct net_device *) dev_id;
> +	struct hip04_priv *priv = netdev_priv(ndev);
> +	u32 ists = readl_relaxed(priv->base + PPE_INTSTS);
> +	u32 val = DEF_INT_MASK;
> +
> +	writel_relaxed(val, priv->base + PPE_RINT);
> +
> +	if ((ists & RCV_INT) || (ists & RCV_NOBUF)) {

What you get with this is the compiler generating code to test RCV_INT,
and then if that's false, code to test RCV_NOBUF.  There's no possibility
for the compiler to optimise that because it's part of the language spec
that condition1 || condition2 will always have condition1 evaluated first,
and condition2 will only be evaluated if condition1 was false.

	if (ists & (RCV_INT | RCV_NOBUF)) {

would more than likely be more efficient here.

> +		if (napi_schedule_prep(&priv->napi)) {
> +			/* disable rx interrupt */
> +			priv->reg_inten &= ~(RCV_INT | RCV_NOBUF);
> +			writel_relaxed(priv->reg_inten, priv->base + PPE_INTEN);
> +			__napi_schedule(&priv->napi);
> +		}
> +	}
> +
> +	return IRQ_HANDLED;
> +}
> +
> +static void hip04_tx_reclaim(struct net_device *ndev, bool force)
> +{
> +	struct hip04_priv *priv = netdev_priv(ndev);
> +	unsigned tx_head = priv->tx_head;
> +	unsigned tx_tail = priv->tx_tail;
> +	struct tx_desc *desc = priv->td_ring[priv->tx_tail];
> +
> +	spin_lock_irq(&priv->txlock);

Do you know for certain that interrupts were (and always will be) definitely
enabled prior to this point?  If not, you should use spin_lock_irqsave()..
spin_unlock_irqrestore().

> +	while (tx_tail != tx_head) {
> +		if (desc->send_addr != 0) {
> +			if (force)
> +				desc->send_addr = 0;
> +			else
> +				break;
> +		}

dma_unmap_single(&ndev->dev, dev_addr, skb->len, DMA_TO_DEVICE) ?

It looks like your device zeros the send address when it has finished
transmitting - if this is true, then you will need to store dev_addr
separately for each transmit packet.

> +		dev_kfree_skb_irq(priv->tx_skb[tx_tail]);
> +		priv->tx_skb[tx_tail] = NULL;
> +		tx_tail = TX_NEXT(tx_tail);
> +		priv->tx_count--;

No processing of transmit statistics?

> +	}
> +	priv->tx_tail = tx_tail;
> +	spin_unlock_irq(&priv->txlock);

If you have freed up any packets, then you should call netif_wake_queue().
Do you not get any interrupts when a packet is transmitted?

> +}
> +
> +static int hip04_mac_start_xmit(struct sk_buff *skb, struct net_device *ndev)
> +{
> +	struct hip04_priv *priv = netdev_priv(ndev);
> +	struct tx_desc *desc = priv->td_ring[priv->tx_head];
> +	unsigned int tx_head = priv->tx_head;
> +	int ret;
> +
> +	hip04_tx_reclaim(ndev, false);
> +
> +	spin_lock_irq(&priv->txlock);

Same comment here...

> +	if (priv->tx_count++ >= TX_DESC_NUM) {
> +		net_dbg_ratelimited("no TX space for packet\n");
> +		netif_stop_queue(ndev);
> +		ret = NETDEV_TX_BUSY;
> +		goto out_unlock;
> +	}

You shouldn't rely on this - you should stop the queue when you put the
last packet to fill the ring before returning from this function.  When
you clean the ring in your hip04_tx_reclaim() function, to wake the
queue.

> +
> +	priv->tx_skb[tx_head] = skb;
> +	dma_map_single(&ndev->dev, skb->data, skb->len, DMA_TO_DEVICE);
> +	memset((void *)desc, 0, sizeof(*desc));
> +	desc->send_addr = (unsigned int)virt_to_phys(skb->data);

Again, dma_map_single() gives you the device address, there's no need
to use virt_to_phys(), and there should be no need for a cast here
either.  Also consider cpu_to_be32() and similar for the other descriptor
writes.

> +	desc->send_size = skb->len;
> +	desc->cfg = DESC_DEF_CFG;
> +	desc->wb_addr = priv->td_phys[tx_head];
> +	endian_change(desc, 64);
> +	skb_tx_timestamp(skb);
> +	hip04_set_xmit_desc(priv, priv->td_phys[tx_head]);
> +
> +	priv->tx_head = TX_NEXT(tx_head);
> +	ret = NETDEV_TX_OK;

As mentioned above, if you have filled the ring, you need to also call
netif_stop_queue() here.

> +static int hip04_mac_probe(struct platform_device *pdev)
> +{
> +	struct device *d = &pdev->dev;
> +	struct device_node *node = d->of_node;
> +	struct net_device *ndev;
> +	struct hip04_priv *priv;
> +	struct resource *res;
> +	unsigned int irq, val;
> +	int ret;
> +
> +	ndev = alloc_etherdev(sizeof(struct hip04_priv));
> +	if (!ndev)
> +		return -ENOMEM;
> +
> +	priv = netdev_priv(ndev);
> +	priv->ndev = ndev;
> +	platform_set_drvdata(pdev, ndev);
> +	spin_lock_init(&priv->txlock);
> +	res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
> +	if (!res) {
> +		ret = -EINVAL;
> +		goto init_fail;
> +	}
> +	ndev->base_addr = res->start;
> +	priv->base = devm_ioremap_resource(d, res);
> +	ret = IS_ERR(priv->base);
> +	if (ret) {
> +		dev_err(d, "devm_ioremap_resource failed\n");
> +		goto init_fail;
> +	}

If you're using devm_ioremap_resource(), you don't need to check the
resource above.  In any case, returning the value from IS_ERR() from
this function is not correct.

	res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
	priv->base = devm_ioremap_resource(d, res);
	if (IS_ERR(priv->base) {
		ret = PTR_ERR(priv->base);
		goto init_fail;
	}

You don't need to fill in ndev->base_addr (many drivers don't.)

-- 
FTTC broadband for 0.8mile line: now at 9.7Mbps down 460kbps up... slowly
improving, and getting towards what was expected from it.
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 148+ messages in thread

* [PATCH 3/3] net: hisilicon: new hip04 ethernet driver
@ 2014-03-18 10:46         ` Russell King - ARM Linux
  0 siblings, 0 replies; 148+ messages in thread
From: Russell King - ARM Linux @ 2014-03-18 10:46 UTC (permalink / raw)
  To: linux-arm-kernel

I was just browsing this patch when I noticed some of these issues - I
haven't done a full review of this driver, I'm just commenting on the
things I've spotted.

On Tue, Mar 18, 2014 at 04:40:17PM +0800, Zhangfei Gao wrote:
> +static int hip04_rx_poll(struct napi_struct *napi, int budget)
> +{
> +	struct hip04_priv *priv = container_of(napi,
> +			      struct hip04_priv, napi);
> +	struct net_device *ndev = priv->ndev;
> +	struct sk_buff *skb;
> +	struct rx_desc *desc;
> +	unsigned char *buf;
> +	int rx = 0;
> +	unsigned int cnt = hip04_recv_cnt(priv);
> +	unsigned int len, tmp[16];
> +
> +	while (cnt) {
> +		buf = priv->rx_buf[priv->rx_head];
> +		skb = build_skb(buf, priv->rx_buf_size);
> +		if (unlikely(!skb))
> +			net_dbg_ratelimited("build_skb failed\n");
> +		dma_map_single(&ndev->dev, skb->data,
> +			RX_BUF_SIZE, DMA_FROM_DEVICE);

This is incorrect.

buf = buffer alloc()
/* CPU owns buffer and can read/write it, device does not */
dev_addr = dma_map_single(dev, buf, ..., DMA_FROM_DEVICE);
/* Device owns buffer and can write it, CPU does not access it */
dma_unmap_single(dev, dev_addr, ..., DMA_FROM_DEVICE);
/* CPU owns buffer again and can read/write it, device does not */

Please turn on DMA API debugging in the kernel debug options and verify
whether your driver causes it to complain (it will.)

I think you want dma_unmap_single() here.

> +		memcpy(tmp, skb->data, 64);
> +		endian_change((void *)tmp, 64);
> +		desc = (struct rx_desc *)tmp;
> +		len = desc->pkt_len;

This is a rather expensive way to do this.  Presumably the descriptors
are always big endian?  If so, why not:

		desc = skb->data;
		len = be16_to_cpu(desc->pkt_len);

?  You may need to lay the struct out differently for this to work so
the offset which pkt_len accesses is correct.

Also... do you not have any flags which indicate whether the packet
received was in error?

> +
> +		if (len > RX_BUF_SIZE)
> +			len = RX_BUF_SIZE;
> +		if (0 == len)
> +			break;
> +
> +		skb_reserve(skb, NET_SKB_PAD + NET_IP_ALIGN);
> +		skb_put(skb, len);
> +		skb->protocol = eth_type_trans(skb, ndev);
> +		napi_gro_receive(&priv->napi, skb);
> +
> +		buf = netdev_alloc_frag(priv->rx_buf_size);
> +		if (!buf)
> +			return -ENOMEM;
> +		priv->rx_buf[priv->rx_head] = buf;
> +		dma_map_single(&ndev->dev, buf, RX_BUF_SIZE, DMA_TO_DEVICE);
> +		hip04_set_recv_desc(priv, virt_to_phys(buf));

No need for virt_to_phys() here - dma_map_single() returns the device
address.

> +
> +		priv->rx_head = RX_NEXT(priv->rx_head);
> +		if (rx++ >= budget)
> +			break;
> +
> +		if (--cnt == 0)
> +			cnt = hip04_recv_cnt(priv);
> +	}
> +
> +	if (rx < budget) {
> +		napi_gro_flush(napi, false);
> +		__napi_complete(napi);
> +	}
> +
> +	/* enable rx interrupt */
> +	priv->reg_inten |= RCV_INT | RCV_NOBUF;
> +	writel_relaxed(priv->reg_inten, priv->base + PPE_INTEN);

This doesn't look right - you're supposed to re-enable receive interrupts
when you receive less than "budget" packets.

> +static irqreturn_t hip04_mac_interrupt(int irq, void *dev_id)
> +{
> +	struct net_device *ndev = (struct net_device *) dev_id;
> +	struct hip04_priv *priv = netdev_priv(ndev);
> +	u32 ists = readl_relaxed(priv->base + PPE_INTSTS);
> +	u32 val = DEF_INT_MASK;
> +
> +	writel_relaxed(val, priv->base + PPE_RINT);
> +
> +	if ((ists & RCV_INT) || (ists & RCV_NOBUF)) {

What you get with this is the compiler generating code to test RCV_INT,
and then if that's false, code to test RCV_NOBUF.  There's no possibility
for the compiler to optimise that because it's part of the language spec
that condition1 || condition2 will always have condition1 evaluated first,
and condition2 will only be evaluated if condition1 was false.

	if (ists & (RCV_INT | RCV_NOBUF)) {

would more than likely be more efficient here.

> +		if (napi_schedule_prep(&priv->napi)) {
> +			/* disable rx interrupt */
> +			priv->reg_inten &= ~(RCV_INT | RCV_NOBUF);
> +			writel_relaxed(priv->reg_inten, priv->base + PPE_INTEN);
> +			__napi_schedule(&priv->napi);
> +		}
> +	}
> +
> +	return IRQ_HANDLED;
> +}
> +
> +static void hip04_tx_reclaim(struct net_device *ndev, bool force)
> +{
> +	struct hip04_priv *priv = netdev_priv(ndev);
> +	unsigned tx_head = priv->tx_head;
> +	unsigned tx_tail = priv->tx_tail;
> +	struct tx_desc *desc = priv->td_ring[priv->tx_tail];
> +
> +	spin_lock_irq(&priv->txlock);

Do you know for certain that interrupts were (and always will be) definitely
enabled prior to this point?  If not, you should use spin_lock_irqsave()..
spin_unlock_irqrestore().

> +	while (tx_tail != tx_head) {
> +		if (desc->send_addr != 0) {
> +			if (force)
> +				desc->send_addr = 0;
> +			else
> +				break;
> +		}

dma_unmap_single(&ndev->dev, dev_addr, skb->len, DMA_TO_DEVICE) ?

It looks like your device zeros the send address when it has finished
transmitting - if this is true, then you will need to store dev_addr
separately for each transmit packet.

> +		dev_kfree_skb_irq(priv->tx_skb[tx_tail]);
> +		priv->tx_skb[tx_tail] = NULL;
> +		tx_tail = TX_NEXT(tx_tail);
> +		priv->tx_count--;

No processing of transmit statistics?

> +	}
> +	priv->tx_tail = tx_tail;
> +	spin_unlock_irq(&priv->txlock);

If you have freed up any packets, then you should call netif_wake_queue().
Do you not get any interrupts when a packet is transmitted?

> +}
> +
> +static int hip04_mac_start_xmit(struct sk_buff *skb, struct net_device *ndev)
> +{
> +	struct hip04_priv *priv = netdev_priv(ndev);
> +	struct tx_desc *desc = priv->td_ring[priv->tx_head];
> +	unsigned int tx_head = priv->tx_head;
> +	int ret;
> +
> +	hip04_tx_reclaim(ndev, false);
> +
> +	spin_lock_irq(&priv->txlock);

Same comment here...

> +	if (priv->tx_count++ >= TX_DESC_NUM) {
> +		net_dbg_ratelimited("no TX space for packet\n");
> +		netif_stop_queue(ndev);
> +		ret = NETDEV_TX_BUSY;
> +		goto out_unlock;
> +	}

You shouldn't rely on this - you should stop the queue when you put the
last packet to fill the ring before returning from this function.  When
you clean the ring in your hip04_tx_reclaim() function, to wake the
queue.

> +
> +	priv->tx_skb[tx_head] = skb;
> +	dma_map_single(&ndev->dev, skb->data, skb->len, DMA_TO_DEVICE);
> +	memset((void *)desc, 0, sizeof(*desc));
> +	desc->send_addr = (unsigned int)virt_to_phys(skb->data);

Again, dma_map_single() gives you the device address, there's no need
to use virt_to_phys(), and there should be no need for a cast here
either.  Also consider cpu_to_be32() and similar for the other descriptor
writes.

> +	desc->send_size = skb->len;
> +	desc->cfg = DESC_DEF_CFG;
> +	desc->wb_addr = priv->td_phys[tx_head];
> +	endian_change(desc, 64);
> +	skb_tx_timestamp(skb);
> +	hip04_set_xmit_desc(priv, priv->td_phys[tx_head]);
> +
> +	priv->tx_head = TX_NEXT(tx_head);
> +	ret = NETDEV_TX_OK;

As mentioned above, if you have filled the ring, you need to also call
netif_stop_queue() here.

> +static int hip04_mac_probe(struct platform_device *pdev)
> +{
> +	struct device *d = &pdev->dev;
> +	struct device_node *node = d->of_node;
> +	struct net_device *ndev;
> +	struct hip04_priv *priv;
> +	struct resource *res;
> +	unsigned int irq, val;
> +	int ret;
> +
> +	ndev = alloc_etherdev(sizeof(struct hip04_priv));
> +	if (!ndev)
> +		return -ENOMEM;
> +
> +	priv = netdev_priv(ndev);
> +	priv->ndev = ndev;
> +	platform_set_drvdata(pdev, ndev);
> +	spin_lock_init(&priv->txlock);
> +	res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
> +	if (!res) {
> +		ret = -EINVAL;
> +		goto init_fail;
> +	}
> +	ndev->base_addr = res->start;
> +	priv->base = devm_ioremap_resource(d, res);
> +	ret = IS_ERR(priv->base);
> +	if (ret) {
> +		dev_err(d, "devm_ioremap_resource failed\n");
> +		goto init_fail;
> +	}

If you're using devm_ioremap_resource(), you don't need to check the
resource above.  In any case, returning the value from IS_ERR() from
this function is not correct.

	res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
	priv->base = devm_ioremap_resource(d, res);
	if (IS_ERR(priv->base) {
		ret = PTR_ERR(priv->base);
		goto init_fail;
	}

You don't need to fill in ndev->base_addr (many drivers don't.)

-- 
FTTC broadband for 0.8mile line: now at 9.7Mbps down 460kbps up... slowly
improving, and getting towards what was expected from it.

^ permalink raw reply	[flat|nested] 148+ messages in thread

* Re: [PATCH 3/3] net: hisilicon: new hip04 ethernet driver
  2014-03-18  8:40     ` Zhangfei Gao
@ 2014-03-18 11:25       ` Arnd Bergmann
  -1 siblings, 0 replies; 148+ messages in thread
From: Arnd Bergmann @ 2014-03-18 11:25 UTC (permalink / raw)
  To: Zhangfei Gao; +Cc: David S. Miller, linux-arm-kernel, netdev, devicetree

On Tuesday 18 March 2014 16:40:17 Zhangfei Gao wrote:

> +
> +static void __iomem *ppebase;

The global 'ppebase' seems hacky. Isn't that a SoC-specific register area, while
the rest of the driver is reusable across SoCs?

What does 'ppe' stand for?

What if there are multiple instances of this, which each have their own ppebase?

> +static void hip04_config_port(struct hip04_priv *priv, u32 speed, u32 duplex)
> +{
> +	u32 val;
> +
> +	priv->speed = speed;
> +	priv->duplex = duplex;
> +
> +	switch (speed) {
> +	case SPEED_1000:
> +		val = 8;
> +		break;
> +	case SPEED_100:
> +		if (priv->id)
> +			val = 7;
> +		else
> +			val = 1;
> +		break;
> +	default:
> +		val = 0;
> +		break;
> +	}
> +	writel_relaxed(val, priv->base + GE_PORT_MODE)

This also seems to encode knowledge about a particular implementation
into the driver. Maybe it's better to add a property for the port
mode?


> +static void hip04_set_xmit_desc(struct hip04_priv *priv, dma_addr_t phys)
> +{
> +	writel_relaxed(phys, priv->base + PPE_CFG_TX_PKT_BD_ADDR);
> +}
> +
> +static void hip04_set_recv_desc(struct hip04_priv *priv, dma_addr_t phys)
> +{
> +	writel_relaxed(phys, ppebase + priv->port * 4 + PPE_CFG_RX_CFF_ADDR);
> +}
> +
> +static u32 hip04_recv_cnt(struct hip04_priv *priv)
> +{
> +	return readl_relaxed(priv->base + PPE_HIS_RX_PKT_CNT);
> +}

At the very least, the hip04_set_xmit_desc() function needs to use 'writel'
rather than 'writel_relaxed'. Otherwise data that is being sent out
can be stuck in the CPU's write buffers and you send stale data on the wire.

For the receive path, you may or may not need to use 'readl', depending
on how DMA is serialized by this device. If you have MSI interrupts, the
interrupt message should already do the serialization, but if you have
edge or level triggered interrupts, you normally need to have one readl()
from the device register between the IRQ and the data access.


> +static void endian_change(void *p, int size)
> +{
> +	unsigned int *to_cover = (unsigned int *)p;
> +	int i;
> +
> +	size = size >> 2;
> +	for (i = 0; i < size; i++)
> +		*(to_cover+i) = htonl(*(to_cover+i));
> +}
> +
> +static int hip04_rx_poll(struct napi_struct *napi, int budget)
> +{
> +	struct hip04_priv *priv = container_of(napi,
> +			      struct hip04_priv, napi);
> +	struct net_device *ndev = priv->ndev;
> +	struct sk_buff *skb;
> +	struct rx_desc *desc;
> +	unsigned char *buf;
> +	int rx = 0;
> +	unsigned int cnt = hip04_recv_cnt(priv);
> +	unsigned int len, tmp[16];
> +
> +	while (cnt) {
> +		buf = priv->rx_buf[priv->rx_head];
> +		skb = build_skb(buf, priv->rx_buf_size);
> +		if (unlikely(!skb))
> +			net_dbg_ratelimited("build_skb failed\n");
> +		dma_map_single(&ndev->dev, skb->data,
> +			RX_BUF_SIZE, DMA_FROM_DEVICE);
> +		memcpy(tmp, skb->data, 64);
> +		endian_change((void *)tmp, 64);
> +		desc = (struct rx_desc *)tmp;
> +		len = desc->pkt_len;

The dma_map_single() seems misplaced here, for all I can tell, the
data has already been transferred. Maybe you mean dma_unmap_single?

I don't see why you copy 64 bytes out of the buffer using endian_change,
rather than just looking at the first word, which seems to have the
only value you are interested in.

> +		if (len > RX_BUF_SIZE)
> +			len = RX_BUF_SIZE;
> +		if (0 == len)
> +			break;
> +
> +		skb_reserve(skb, NET_SKB_PAD + NET_IP_ALIGN);
> +		skb_put(skb, len);
> +		skb->protocol = eth_type_trans(skb, ndev);
> +		napi_gro_receive(&priv->napi, skb);
> +
> +		buf = netdev_alloc_frag(priv->rx_buf_size);
> +		if (!buf)
> +			return -ENOMEM;
> +		priv->rx_buf[priv->rx_head] = buf;
> +		dma_map_single(&ndev->dev, buf, RX_BUF_SIZE, DMA_TO_DEVICE);

Maybe you mean DMA_FROM_DEVICE? The call here doesn't seem to make any
sense. You also need to use the return value of dma_map_single() every
time you call it.

> +		hip04_set_recv_desc(priv, virt_to_phys(buf));

and put it right here in the next line. virt_to_phys() is not the correct
function call, that is what dma_map_single() is meant for.

> +		priv->rx_head = RX_NEXT(priv->rx_head);
> +		if (rx++ >= budget)
> +			break;
> +
> +		if (--cnt == 0)
> +			cnt = hip04_recv_cnt(priv);
> +	}
> +
> +	if (rx < budget) {
> +		napi_gro_flush(napi, false);
> +		__napi_complete(napi);
> +	}
> +
> +	/* enable rx interrupt */
> +	priv->reg_inten |= RCV_INT | RCV_NOBUF;
> +	writel_relaxed(priv->reg_inten, priv->base + PPE_INTEN);


Why do you unconditionally turn on interrupts here? Shouldn't you
only do that after calling napi_complete()?


> +
> +static int hip04_mac_start_xmit(struct sk_buff *skb, struct net_device *ndev)
> +{
> +	struct hip04_priv *priv = netdev_priv(ndev);
> +	struct tx_desc *desc = priv->td_ring[priv->tx_head];
> +	unsigned int tx_head = priv->tx_head;
> +	int ret;
> +
> +	hip04_tx_reclaim(ndev, false);
> +
> +	spin_lock_irq(&priv->txlock);
> +	if (priv->tx_count++ >= TX_DESC_NUM) {
> +		net_dbg_ratelimited("no TX space for packet\n");
> +		netif_stop_queue(ndev);
> +		ret = NETDEV_TX_BUSY;
> +		goto out_unlock;
> +	}
> +
> +	priv->tx_skb[tx_head] = skb;
> +	dma_map_single(&ndev->dev, skb->data, skb->len, DMA_TO_DEVICE);
> +	memset((void *)desc, 0, sizeof(*desc));
> +	desc->send_addr = (unsigned int)virt_to_phys(skb->data);

Just like above: you must not use virt_to_phys here, but rather use
the output of dma_map_single.

IIRC, you can't generally call dma_map_single() under a spinlock, so
better move that ahead. It may also be a slow operation.

> +
> +static int hip04_mac_open(struct net_device *ndev)
> +{
> +	struct hip04_priv *priv = netdev_priv(ndev);
> +	int i;
> +
> +	hip04_reset_ppe(priv);
> +	for (i = 0; i < RX_DESC_NUM; i++) {
> +		dma_map_single(&ndev->dev, priv->rx_buf[i],
> +				RX_BUF_SIZE, DMA_TO_DEVICE);
> +		hip04_set_recv_desc(priv, virt_to_phys(priv->rx_buf[i]));
> +	}

And one more. Also DMA_FROM_DEVICE.

> +static int hip04_alloc_ring(struct net_device *ndev, struct device *d)
> +{
> +	struct hip04_priv *priv = netdev_priv(ndev);
> +	int i;
> +
> +	priv->rx_buf_size = RX_BUF_SIZE +
> +			    SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
> +
> +	priv->desc_pool = dma_pool_create(DRV_NAME, d, sizeof(struct tx_desc),
> +				SKB_DATA_ALIGN(sizeof(struct tx_desc)),	0);
> +	if (!priv->desc_pool)
> +		return -ENOMEM;
> +
> +	for (i = 0; i < TX_DESC_NUM; i++) {
> +		priv->td_ring[i] = dma_pool_alloc(priv->desc_pool,
> +					GFP_ATOMIC, &priv->td_phys[i]);
> +		if (!priv->td_ring[i])
> +			return -ENOMEM;
> +	}

Why do you create a dma pool here, when you do all the allocations upfront?

It looks to me like you could simply turn the td_ring array of pointers
to tx descriptors into a an array of tx descriptors (no pointers) and allocate
that one using dma_alloc_coherent.


> +	if (!ppebase) {
> +		struct device_node *n;
> +
> +		n = of_find_compatible_node(NULL, NULL, "hisilicon,hip04-ppebase");
> +		if (!n) {
> +			ret = -EINVAL;
> +			netdev_err(ndev, "not find hisilicon,ppebase\n");
> +			goto init_fail;
> +		}
> +		ppebase = of_iomap(n, 0);
> +	}

How about using syscon_regmap_lookup_by_phandle() here? That way, you can have
a more generic abstraction of the ppe, and stick the port and id in there as
well, e.g.

	ppe-syscon = <&hip04ppe 1 4>; // ppe, port, id 

> +	ret = of_property_read_u32(node, "speed", &val);
> +	if (ret) {
> +		dev_warn(d, "not find speed info\n");
> +		priv->speed = SPEED_1000;
> +	}
> +
> +	if (SPEED_100 == val)
> +		priv->speed = SPEED_100;
> +	else
> +		priv->speed = SPEED_1000;
> +	priv->duplex = DUPLEX_FULL;

Why do you even need the speed here, shouldn't you get that information
from the phy through hip04_adjust_link?

	Arnd

^ permalink raw reply	[flat|nested] 148+ messages in thread

* [PATCH 3/3] net: hisilicon: new hip04 ethernet driver
@ 2014-03-18 11:25       ` Arnd Bergmann
  0 siblings, 0 replies; 148+ messages in thread
From: Arnd Bergmann @ 2014-03-18 11:25 UTC (permalink / raw)
  To: linux-arm-kernel

On Tuesday 18 March 2014 16:40:17 Zhangfei Gao wrote:

> +
> +static void __iomem *ppebase;

The global 'ppebase' seems hacky. Isn't that a SoC-specific register area, while
the rest of the driver is reusable across SoCs?

What does 'ppe' stand for?

What if there are multiple instances of this, which each have their own ppebase?

> +static void hip04_config_port(struct hip04_priv *priv, u32 speed, u32 duplex)
> +{
> +	u32 val;
> +
> +	priv->speed = speed;
> +	priv->duplex = duplex;
> +
> +	switch (speed) {
> +	case SPEED_1000:
> +		val = 8;
> +		break;
> +	case SPEED_100:
> +		if (priv->id)
> +			val = 7;
> +		else
> +			val = 1;
> +		break;
> +	default:
> +		val = 0;
> +		break;
> +	}
> +	writel_relaxed(val, priv->base + GE_PORT_MODE)

This also seems to encode knowledge about a particular implementation
into the driver. Maybe it's better to add a property for the port
mode?


> +static void hip04_set_xmit_desc(struct hip04_priv *priv, dma_addr_t phys)
> +{
> +	writel_relaxed(phys, priv->base + PPE_CFG_TX_PKT_BD_ADDR);
> +}
> +
> +static void hip04_set_recv_desc(struct hip04_priv *priv, dma_addr_t phys)
> +{
> +	writel_relaxed(phys, ppebase + priv->port * 4 + PPE_CFG_RX_CFF_ADDR);
> +}
> +
> +static u32 hip04_recv_cnt(struct hip04_priv *priv)
> +{
> +	return readl_relaxed(priv->base + PPE_HIS_RX_PKT_CNT);
> +}

At the very least, the hip04_set_xmit_desc() function needs to use 'writel'
rather than 'writel_relaxed'. Otherwise data that is being sent out
can be stuck in the CPU's write buffers and you send stale data on the wire.

For the receive path, you may or may not need to use 'readl', depending
on how DMA is serialized by this device. If you have MSI interrupts, the
interrupt message should already do the serialization, but if you have
edge or level triggered interrupts, you normally need to have one readl()
from the device register between the IRQ and the data access.


> +static void endian_change(void *p, int size)
> +{
> +	unsigned int *to_cover = (unsigned int *)p;
> +	int i;
> +
> +	size = size >> 2;
> +	for (i = 0; i < size; i++)
> +		*(to_cover+i) = htonl(*(to_cover+i));
> +}
> +
> +static int hip04_rx_poll(struct napi_struct *napi, int budget)
> +{
> +	struct hip04_priv *priv = container_of(napi,
> +			      struct hip04_priv, napi);
> +	struct net_device *ndev = priv->ndev;
> +	struct sk_buff *skb;
> +	struct rx_desc *desc;
> +	unsigned char *buf;
> +	int rx = 0;
> +	unsigned int cnt = hip04_recv_cnt(priv);
> +	unsigned int len, tmp[16];
> +
> +	while (cnt) {
> +		buf = priv->rx_buf[priv->rx_head];
> +		skb = build_skb(buf, priv->rx_buf_size);
> +		if (unlikely(!skb))
> +			net_dbg_ratelimited("build_skb failed\n");
> +		dma_map_single(&ndev->dev, skb->data,
> +			RX_BUF_SIZE, DMA_FROM_DEVICE);
> +		memcpy(tmp, skb->data, 64);
> +		endian_change((void *)tmp, 64);
> +		desc = (struct rx_desc *)tmp;
> +		len = desc->pkt_len;

The dma_map_single() seems misplaced here, for all I can tell, the
data has already been transferred. Maybe you mean dma_unmap_single?

I don't see why you copy 64 bytes out of the buffer using endian_change,
rather than just looking at the first word, which seems to have the
only value you are interested in.

> +		if (len > RX_BUF_SIZE)
> +			len = RX_BUF_SIZE;
> +		if (0 == len)
> +			break;
> +
> +		skb_reserve(skb, NET_SKB_PAD + NET_IP_ALIGN);
> +		skb_put(skb, len);
> +		skb->protocol = eth_type_trans(skb, ndev);
> +		napi_gro_receive(&priv->napi, skb);
> +
> +		buf = netdev_alloc_frag(priv->rx_buf_size);
> +		if (!buf)
> +			return -ENOMEM;
> +		priv->rx_buf[priv->rx_head] = buf;
> +		dma_map_single(&ndev->dev, buf, RX_BUF_SIZE, DMA_TO_DEVICE);

Maybe you mean DMA_FROM_DEVICE? The call here doesn't seem to make any
sense. You also need to use the return value of dma_map_single() every
time you call it.

> +		hip04_set_recv_desc(priv, virt_to_phys(buf));

and put it right here in the next line. virt_to_phys() is not the correct
function call, that is what dma_map_single() is meant for.

> +		priv->rx_head = RX_NEXT(priv->rx_head);
> +		if (rx++ >= budget)
> +			break;
> +
> +		if (--cnt == 0)
> +			cnt = hip04_recv_cnt(priv);
> +	}
> +
> +	if (rx < budget) {
> +		napi_gro_flush(napi, false);
> +		__napi_complete(napi);
> +	}
> +
> +	/* enable rx interrupt */
> +	priv->reg_inten |= RCV_INT | RCV_NOBUF;
> +	writel_relaxed(priv->reg_inten, priv->base + PPE_INTEN);


Why do you unconditionally turn on interrupts here? Shouldn't you
only do that after calling napi_complete()?


> +
> +static int hip04_mac_start_xmit(struct sk_buff *skb, struct net_device *ndev)
> +{
> +	struct hip04_priv *priv = netdev_priv(ndev);
> +	struct tx_desc *desc = priv->td_ring[priv->tx_head];
> +	unsigned int tx_head = priv->tx_head;
> +	int ret;
> +
> +	hip04_tx_reclaim(ndev, false);
> +
> +	spin_lock_irq(&priv->txlock);
> +	if (priv->tx_count++ >= TX_DESC_NUM) {
> +		net_dbg_ratelimited("no TX space for packet\n");
> +		netif_stop_queue(ndev);
> +		ret = NETDEV_TX_BUSY;
> +		goto out_unlock;
> +	}
> +
> +	priv->tx_skb[tx_head] = skb;
> +	dma_map_single(&ndev->dev, skb->data, skb->len, DMA_TO_DEVICE);
> +	memset((void *)desc, 0, sizeof(*desc));
> +	desc->send_addr = (unsigned int)virt_to_phys(skb->data);

Just like above: you must not use virt_to_phys here, but rather use
the output of dma_map_single.

IIRC, you can't generally call dma_map_single() under a spinlock, so
better move that ahead. It may also be a slow operation.

> +
> +static int hip04_mac_open(struct net_device *ndev)
> +{
> +	struct hip04_priv *priv = netdev_priv(ndev);
> +	int i;
> +
> +	hip04_reset_ppe(priv);
> +	for (i = 0; i < RX_DESC_NUM; i++) {
> +		dma_map_single(&ndev->dev, priv->rx_buf[i],
> +				RX_BUF_SIZE, DMA_TO_DEVICE);
> +		hip04_set_recv_desc(priv, virt_to_phys(priv->rx_buf[i]));
> +	}

And one more. Also DMA_FROM_DEVICE.

> +static int hip04_alloc_ring(struct net_device *ndev, struct device *d)
> +{
> +	struct hip04_priv *priv = netdev_priv(ndev);
> +	int i;
> +
> +	priv->rx_buf_size = RX_BUF_SIZE +
> +			    SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
> +
> +	priv->desc_pool = dma_pool_create(DRV_NAME, d, sizeof(struct tx_desc),
> +				SKB_DATA_ALIGN(sizeof(struct tx_desc)),	0);
> +	if (!priv->desc_pool)
> +		return -ENOMEM;
> +
> +	for (i = 0; i < TX_DESC_NUM; i++) {
> +		priv->td_ring[i] = dma_pool_alloc(priv->desc_pool,
> +					GFP_ATOMIC, &priv->td_phys[i]);
> +		if (!priv->td_ring[i])
> +			return -ENOMEM;
> +	}

Why do you create a dma pool here, when you do all the allocations upfront?

It looks to me like you could simply turn the td_ring array of pointers
to tx descriptors into a an array of tx descriptors (no pointers) and allocate
that one using dma_alloc_coherent.


> +	if (!ppebase) {
> +		struct device_node *n;
> +
> +		n = of_find_compatible_node(NULL, NULL, "hisilicon,hip04-ppebase");
> +		if (!n) {
> +			ret = -EINVAL;
> +			netdev_err(ndev, "not find hisilicon,ppebase\n");
> +			goto init_fail;
> +		}
> +		ppebase = of_iomap(n, 0);
> +	}

How about using syscon_regmap_lookup_by_phandle() here? That way, you can have
a more generic abstraction of the ppe, and stick the port and id in there as
well, e.g.

	ppe-syscon = <&hip04ppe 1 4>; // ppe, port, id 

> +	ret = of_property_read_u32(node, "speed", &val);
> +	if (ret) {
> +		dev_warn(d, "not find speed info\n");
> +		priv->speed = SPEED_1000;
> +	}
> +
> +	if (SPEED_100 == val)
> +		priv->speed = SPEED_100;
> +	else
> +		priv->speed = SPEED_1000;
> +	priv->duplex = DUPLEX_FULL;

Why do you even need the speed here, shouldn't you get that information
from the phy through hip04_adjust_link?

	Arnd

^ permalink raw reply	[flat|nested] 148+ messages in thread

* Re: [PATCH 1/3] Documentation: add Device tree bindings for Hisilicon hip04 ethernet
  2014-03-18  8:40   ` Zhangfei Gao
@ 2014-03-18 12:34     ` Mark Rutland
  -1 siblings, 0 replies; 148+ messages in thread
From: Mark Rutland @ 2014-03-18 12:34 UTC (permalink / raw)
  To: Zhangfei Gao; +Cc: David S. Miller, netdev, linux-arm-kernel, devicetree

On Tue, Mar 18, 2014 at 08:40:15AM +0000, Zhangfei Gao wrote:
> This patch adds the Device Tree bindings for the Hisilicon hip04
> Ethernet controller, including 100M / 1000M controller.
> 
> Signed-off-by: Zhangfei Gao <zhangfei.gao@linaro.org>
> ---
>  .../bindings/net/hisilicon-hip04-net.txt           |   74 ++++++++++++++++++++
>  1 file changed, 74 insertions(+)
>  create mode 100644 Documentation/devicetree/bindings/net/hisilicon-hip04-net.txt
> 
> diff --git a/Documentation/devicetree/bindings/net/hisilicon-hip04-net.txt b/Documentation/devicetree/bindings/net/hisilicon-hip04-net.txt
> new file mode 100644
> index 0000000..c918f08
> --- /dev/null
> +++ b/Documentation/devicetree/bindings/net/hisilicon-hip04-net.txt
> @@ -0,0 +1,74 @@
> +Hisilicon hip04 Ethernet Controller
> +
> +* Ethernet controller node
> +
> +Required properties:
> +- compatible: should be "hisilicon,hip04-mac".
> +- reg: address and length of the register set for the device.
> +- interrupts: interrupt for the device.
> +- port: ppe port number connected to the controller: range from 0 to 31.

ppe?

Will there ever be more than one ppe? If so, describing the linkage to
the ppe with a phandle + args approach is preferable.

> +- speed: 100 (100M) or 1000 (1000M).

Can you not query this from the hardware?

> +- id: should be different and fe should be 0.

This description is useless.

What is this for, and why does this need to be in the dt? What is "fe"?

> +
> +Optional Properties:
> +- phy-handle : the phandle to a PHY node
> +
> +
> +* Ethernet ppe node: control rx & tx fifos of all ethernet controllers
> +
> +Required properties:
> +- compatible: should be "hisilicon,hip04-ppebase".

Why "ppebase" rather than "ppe"?

> +- reg: address and length of the register set for the node.

s/node/device/

Cheers,
Mark.

^ permalink raw reply	[flat|nested] 148+ messages in thread

* [PATCH 1/3] Documentation: add Device tree bindings for Hisilicon hip04 ethernet
@ 2014-03-18 12:34     ` Mark Rutland
  0 siblings, 0 replies; 148+ messages in thread
From: Mark Rutland @ 2014-03-18 12:34 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Mar 18, 2014 at 08:40:15AM +0000, Zhangfei Gao wrote:
> This patch adds the Device Tree bindings for the Hisilicon hip04
> Ethernet controller, including 100M / 1000M controller.
> 
> Signed-off-by: Zhangfei Gao <zhangfei.gao@linaro.org>
> ---
>  .../bindings/net/hisilicon-hip04-net.txt           |   74 ++++++++++++++++++++
>  1 file changed, 74 insertions(+)
>  create mode 100644 Documentation/devicetree/bindings/net/hisilicon-hip04-net.txt
> 
> diff --git a/Documentation/devicetree/bindings/net/hisilicon-hip04-net.txt b/Documentation/devicetree/bindings/net/hisilicon-hip04-net.txt
> new file mode 100644
> index 0000000..c918f08
> --- /dev/null
> +++ b/Documentation/devicetree/bindings/net/hisilicon-hip04-net.txt
> @@ -0,0 +1,74 @@
> +Hisilicon hip04 Ethernet Controller
> +
> +* Ethernet controller node
> +
> +Required properties:
> +- compatible: should be "hisilicon,hip04-mac".
> +- reg: address and length of the register set for the device.
> +- interrupts: interrupt for the device.
> +- port: ppe port number connected to the controller: range from 0 to 31.

ppe?

Will there ever be more than one ppe? If so, describing the linkage to
the ppe with a phandle + args approach is preferable.

> +- speed: 100 (100M) or 1000 (1000M).

Can you not query this from the hardware?

> +- id: should be different and fe should be 0.

This description is useless.

What is this for, and why does this need to be in the dt? What is "fe"?

> +
> +Optional Properties:
> +- phy-handle : the phandle to a PHY node
> +
> +
> +* Ethernet ppe node: control rx & tx fifos of all ethernet controllers
> +
> +Required properties:
> +- compatible: should be "hisilicon,hip04-ppebase".

Why "ppebase" rather than "ppe"?

> +- reg: address and length of the register set for the node.

s/node/device/

Cheers,
Mark.

^ permalink raw reply	[flat|nested] 148+ messages in thread

* Re: [PATCH 1/3] Documentation: add Device tree bindings for Hisilicon hip04 ethernet
  2014-03-18  8:40   ` Zhangfei Gao
@ 2014-03-18 12:51     ` Sergei Shtylyov
  -1 siblings, 0 replies; 148+ messages in thread
From: Sergei Shtylyov @ 2014-03-18 12:51 UTC (permalink / raw)
  To: Zhangfei Gao, David S. Miller; +Cc: netdev, linux-arm-kernel, devicetree

Hello.

On 18-03-2014 12:40, Zhangfei Gao wrote:

> This patch adds the Device Tree bindings for the Hisilicon hip04
> Ethernet controller, including 100M / 1000M controller.

> Signed-off-by: Zhangfei Gao <zhangfei.gao@linaro.org>
> ---
>   .../bindings/net/hisilicon-hip04-net.txt           |   74 ++++++++++++++++++++
>   1 file changed, 74 insertions(+)
>   create mode 100644 Documentation/devicetree/bindings/net/hisilicon-hip04-net.txt

> diff --git a/Documentation/devicetree/bindings/net/hisilicon-hip04-net.txt b/Documentation/devicetree/bindings/net/hisilicon-hip04-net.txt
> new file mode 100644
> index 0000000..c918f08
> --- /dev/null
> +++ b/Documentation/devicetree/bindings/net/hisilicon-hip04-net.txt
> @@ -0,0 +1,74 @@
> +Hisilicon hip04 Ethernet Controller
> +
> +* Ethernet controller node
> +
> +Required properties:
> +- compatible: should be "hisilicon,hip04-mac".
> +- reg: address and length of the register set for the device.
> +- interrupts: interrupt for the device.
> +- port: ppe port number connected to the controller: range from 0 to 31.
> +- speed: 100 (100M) or 1000 (1000M).

    There's standard "max-speed" property for that, see 
Documentation/devicetree/bindings/net/ethernet.txt in the 'net-next.git' repo.

> +Optional Properties:
> +- phy-handle : the phandle to a PHY node

    Please refer instead to the above mentioned file for this standard 
property -- it is already described there. See other binding files as the example.

WBR, Sergei

^ permalink raw reply	[flat|nested] 148+ messages in thread

* [PATCH 1/3] Documentation: add Device tree bindings for Hisilicon hip04 ethernet
@ 2014-03-18 12:51     ` Sergei Shtylyov
  0 siblings, 0 replies; 148+ messages in thread
From: Sergei Shtylyov @ 2014-03-18 12:51 UTC (permalink / raw)
  To: linux-arm-kernel

Hello.

On 18-03-2014 12:40, Zhangfei Gao wrote:

> This patch adds the Device Tree bindings for the Hisilicon hip04
> Ethernet controller, including 100M / 1000M controller.

> Signed-off-by: Zhangfei Gao <zhangfei.gao@linaro.org>
> ---
>   .../bindings/net/hisilicon-hip04-net.txt           |   74 ++++++++++++++++++++
>   1 file changed, 74 insertions(+)
>   create mode 100644 Documentation/devicetree/bindings/net/hisilicon-hip04-net.txt

> diff --git a/Documentation/devicetree/bindings/net/hisilicon-hip04-net.txt b/Documentation/devicetree/bindings/net/hisilicon-hip04-net.txt
> new file mode 100644
> index 0000000..c918f08
> --- /dev/null
> +++ b/Documentation/devicetree/bindings/net/hisilicon-hip04-net.txt
> @@ -0,0 +1,74 @@
> +Hisilicon hip04 Ethernet Controller
> +
> +* Ethernet controller node
> +
> +Required properties:
> +- compatible: should be "hisilicon,hip04-mac".
> +- reg: address and length of the register set for the device.
> +- interrupts: interrupt for the device.
> +- port: ppe port number connected to the controller: range from 0 to 31.
> +- speed: 100 (100M) or 1000 (1000M).

    There's standard "max-speed" property for that, see 
Documentation/devicetree/bindings/net/ethernet.txt in the 'net-next.git' repo.

> +Optional Properties:
> +- phy-handle : the phandle to a PHY node

    Please refer instead to the above mentioned file for this standard 
property -- it is already described there. See other binding files as the example.

WBR, Sergei

^ permalink raw reply	[flat|nested] 148+ messages in thread

* Re: [PATCH 2/3] net: hisilicon: new hip04 MDIO driver
  2014-03-18  8:40   ` Zhangfei Gao
@ 2014-03-18 17:28       ` Florian Fainelli
  -1 siblings, 0 replies; 148+ messages in thread
From: Florian Fainelli @ 2014-03-18 17:28 UTC (permalink / raw)
  To: Zhangfei Gao
  Cc: David S. Miller, netdev,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	devicetree-u79uwXL29TY76Z2rM5mHXA

2014-03-18 1:40 GMT-07:00 Zhangfei Gao <zhangfei.gao-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org>:
> Hisilicon hip04 platform mdio driver
> Reuse Marvell phy drivers/net/phy/marvell.c
>
> Signed-off-by: Zhangfei Gao <zhangfei.gao-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org>
> ---
>  drivers/net/ethernet/Kconfig                |    1 +
>  drivers/net/ethernet/Makefile               |    1 +
>  drivers/net/ethernet/hisilicon/Kconfig      |   31 +++++
>  drivers/net/ethernet/hisilicon/Makefile     |    5 +
>  drivers/net/ethernet/hisilicon/hip04_mdio.c |  190 +++++++++++++++++++++++++++
>  5 files changed, 228 insertions(+)
>  create mode 100644 drivers/net/ethernet/hisilicon/Kconfig
>  create mode 100644 drivers/net/ethernet/hisilicon/Makefile
>  create mode 100644 drivers/net/ethernet/hisilicon/hip04_mdio.c
>
> diff --git a/drivers/net/ethernet/Kconfig b/drivers/net/ethernet/Kconfig
> index 39484b5..cef103d 100644
> --- a/drivers/net/ethernet/Kconfig
> +++ b/drivers/net/ethernet/Kconfig
> @@ -55,6 +55,7 @@ source "drivers/net/ethernet/neterion/Kconfig"
>  source "drivers/net/ethernet/faraday/Kconfig"
>  source "drivers/net/ethernet/freescale/Kconfig"
>  source "drivers/net/ethernet/fujitsu/Kconfig"
> +source "drivers/net/ethernet/hisilicon/Kconfig"
>  source "drivers/net/ethernet/hp/Kconfig"
>  source "drivers/net/ethernet/ibm/Kconfig"
>  source "drivers/net/ethernet/intel/Kconfig"
> diff --git a/drivers/net/ethernet/Makefile b/drivers/net/ethernet/Makefile
> index adf61af..f70b166 100644
> --- a/drivers/net/ethernet/Makefile
> +++ b/drivers/net/ethernet/Makefile
> @@ -30,6 +30,7 @@ obj-$(CONFIG_NET_VENDOR_EXAR) += neterion/
>  obj-$(CONFIG_NET_VENDOR_FARADAY) += faraday/
>  obj-$(CONFIG_NET_VENDOR_FREESCALE) += freescale/
>  obj-$(CONFIG_NET_VENDOR_FUJITSU) += fujitsu/
> +obj-$(CONFIG_NET_VENDOR_HISILICON) += hisilicon/
>  obj-$(CONFIG_NET_VENDOR_HP) += hp/
>  obj-$(CONFIG_NET_VENDOR_IBM) += ibm/
>  obj-$(CONFIG_NET_VENDOR_INTEL) += intel/
> diff --git a/drivers/net/ethernet/hisilicon/Kconfig b/drivers/net/ethernet/hisilicon/Kconfig
> new file mode 100644
> index 0000000..4b1c065
> --- /dev/null
> +++ b/drivers/net/ethernet/hisilicon/Kconfig
> @@ -0,0 +1,31 @@
> +#
> +# HISILICON device configuration
> +#
> +
> +config NET_VENDOR_HISILICON
> +       bool "Hisilicon devices"
> +       default y
> +       depends on ARM
> +       ---help---
> +         If you have a network (Ethernet) card belonging to this class, say Y
> +         and read the Ethernet-HOWTO, available from
> +         <http://www.tldp.org/docs.html#howto>.
> +
> +         Note that the answer to this question doesn't directly affect the
> +         kernel: saying N will just cause the configurator to skip all
> +         the questions about MOXA ART devices. If you say Y, you will be asked
> +         for your specific card in the following questions.
> +
> +if NET_VENDOR_HISILICON
> +
> +config HIP04_ETH
> +       tristate "HISILICON P04 Ethernet support"
> +       select NET_CORE
> +       select PHYLIB
> +       select MARVELL_PHY
> +       ---help---
> +         If you wish to compile a kernel for a hardware with hisilicon p04 SoC and
> +         want to use the internal ethernet then you should answer Y to this.
> +
> +
> +endif # NET_VENDOR_HISILICON
> diff --git a/drivers/net/ethernet/hisilicon/Makefile b/drivers/net/ethernet/hisilicon/Makefile
> new file mode 100644
> index 0000000..1d6eb6e
> --- /dev/null
> +++ b/drivers/net/ethernet/hisilicon/Makefile
> @@ -0,0 +1,5 @@
> +#
> +# Makefile for the HISILICON network device drivers.
> +#
> +
> +obj-$(CONFIG_HIP04_ETH) += hip04_mdio.o
> diff --git a/drivers/net/ethernet/hisilicon/hip04_mdio.c b/drivers/net/ethernet/hisilicon/hip04_mdio.c
> new file mode 100644
> index 0000000..960adc2
> --- /dev/null
> +++ b/drivers/net/ethernet/hisilicon/hip04_mdio.c
> @@ -0,0 +1,190 @@
> +
> +/* Copyright (c) 2014 Linaro Ltd.
> + * Copyright (c) 2014 Hisilicon Limited.
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License as published by
> + * the Free Software Foundation; either version 2 of the License, or
> + * (at your option) any later version.
> + */
> +
> +#include <linux/module.h>
> +#include <linux/platform_device.h>
> +#include <linux/io.h>
> +#include <linux/of_mdio.h>
> +#include <linux/delay.h>
> +
> +#define MDIO_CMD_REG           0x0
> +#define MDIO_ADDR_REG          0x4
> +#define MDIO_WDATA_REG         0x8
> +#define MDIO_RDATA_REG         0xc
> +#define MDIO_STA_REG           0x10
> +
> +#define MDIO_START             BIT(14)
> +#define MDIO_R_VALID           BIT(1)
> +#define MDIO_READ              (BIT(12) | BIT(11) | MDIO_START)
> +#define MDIO_WRITE             (BIT(12) | BIT(10) | MDIO_START)
> +
> +struct hip04_mdio_priv {
> +       void __iomem *base;
> +};
> +
> +#define WAIT_TIMEOUT 10
> +static int hip04_mdio_wait_ready(struct mii_bus *bus)
> +{
> +       struct hip04_mdio_priv *priv = bus->priv;
> +       int i;
> +
> +       for (i = 0; readl_relaxed(priv->base + MDIO_CMD_REG) & MDIO_START; i++) {
> +               if (i == WAIT_TIMEOUT)
> +                       return -ETIMEDOUT;
> +               msleep(20);
> +       }
> +
> +       return 0;
> +}
> +
> +static int hip04_mdio_read(struct mii_bus *bus, int mii_id, int regnum)
> +{
> +       struct hip04_mdio_priv *priv = bus->priv;
> +       u32 val;
> +       int ret;
> +
> +       ret = hip04_mdio_wait_ready(bus);
> +       if (ret < 0)
> +               goto out;
> +
> +       val = regnum | (mii_id << 5) | MDIO_READ;
> +       writel_relaxed(val, priv->base + MDIO_CMD_REG);
> +
> +       ret = hip04_mdio_wait_ready(bus);
> +       if (ret < 0)
> +               goto out;
> +       val = readl_relaxed(priv->base + MDIO_STA_REG);
> +       if (val & MDIO_R_VALID) {
> +               dev_err(bus->parent, "SMI bus read not valid\n");
> +               ret = -ENODEV;
> +               goto out;
> +       }
> +       val = readl_relaxed(priv->base + MDIO_RDATA_REG);
> +       ret = val & 0xFFFF;
> +out:
> +       return ret;
> +}
> +
> +static int hip04_mdio_write(struct mii_bus *bus, int mii_id,
> +                           int regnum, u16 value)
> +{
> +       struct hip04_mdio_priv *priv = bus->priv;
> +       u32 val;
> +       int ret;
> +
> +       ret = hip04_mdio_wait_ready(bus);
> +       if (ret < 0)
> +               goto out;
> +
> +       writel_relaxed(value, priv->base + MDIO_WDATA_REG);
> +       val = regnum | (mii_id << 5) | MDIO_WRITE;
> +       writel_relaxed(val, priv->base + MDIO_CMD_REG);
> +out:
> +       return ret;
> +}
> +
> +
> +static int hip04_mdio_reset(struct mii_bus *bus)
> +{
> +       int temp, err, i;
> +
> +       for (i = 0; i < 2; i++) {
> +               hip04_mdio_write(bus, i, 22, 0);
> +               temp = hip04_mdio_read(bus, i, MII_BMCR);
> +               temp |= BMCR_RESET;
> +               err = hip04_mdio_write(bus, i, MII_BMCR, temp);
> +               if (err < 0)
> +                       return err;
> +       }
> +
> +       mdelay(500);

This does not look correct, you should iterate over all possible PHYs:
PHY_MAX_ADDR instead of hardcoding the loop to 2.

I think we might want to remove the mdio bus reset callback in general
as the PHY library should already take care of software resetting the
PHY to put it in a sane state, as well as waiting for the appropriate
delay before using, unlike here, where you do not poll for BMCR_RESET
to be cleared by the PHY.

> +       return 0;
> +}
> +
> +static int hip04_mdio_probe(struct platform_device *pdev)
> +{
> +       struct resource *r;
> +       struct mii_bus *bus;
> +       struct hip04_mdio_priv *priv;
> +       int ret;
> +
> +       r = platform_get_resource(pdev, IORESOURCE_MEM, 0);
> +       if (!r) {
> +               dev_err(&pdev->dev, "No SMI register address given\n");
> +               return -ENODEV;
> +       }
> +
> +       bus = mdiobus_alloc_size(sizeof(struct hip04_mdio_priv));
> +       if (!bus) {
> +               dev_err(&pdev->dev, "Cannot allocate MDIO bus\n");
> +               return -ENOMEM;
> +       }
> +
> +       bus->name = "hip04_mdio_bus";
> +       bus->read = hip04_mdio_read;
> +       bus->write = hip04_mdio_write;
> +       bus->reset = hip04_mdio_reset;
> +       snprintf(bus->id, MII_BUS_ID_SIZE, "%s-mii",
> +                dev_name(&pdev->dev));
> +       bus->parent = &pdev->dev;
> +       priv = bus->priv;
> +       priv->base = devm_ioremap(&pdev->dev, r->start, resource_size(r));
> +       if (!priv->base) {
> +               dev_err(&pdev->dev, "Unable to remap SMI register\n");
> +               ret = -ENODEV;
> +               goto out_mdio;
> +       }
> +
> +       ret = of_mdiobus_register(bus, pdev->dev.of_node);
> +       if (ret < 0) {
> +               dev_err(&pdev->dev, "Cannot register MDIO bus (%d)\n", ret);
> +               goto out_mdio;
> +       }
> +
> +       platform_set_drvdata(pdev, bus);
> +
> +       return 0;
> +
> +out_mdio:
> +       mdiobus_free(bus);
> +       return ret;
> +}
> +
> +static int hip04_mdio_remove(struct platform_device *pdev)
> +{
> +       struct mii_bus *bus = platform_get_drvdata(pdev);
> +
> +       mdiobus_unregister(bus);
> +       mdiobus_free(bus);
> +
> +       return 0;
> +}
> +
> +static const struct of_device_id hip04_mdio_match[] = {
> +       { .compatible = "hisilicon,hip04-mdio" },
> +       { }
> +};
> +MODULE_DEVICE_TABLE(of, hip04_mdio_match);
> +
> +static struct platform_driver hip04_mdio_driver = {
> +       .probe = hip04_mdio_probe,
> +       .remove = hip04_mdio_remove,
> +       .driver = {
> +               .name = "hip04-mdio",
> +               .owner = THIS_MODULE,
> +               .of_match_table = hip04_mdio_match,
> +       },
> +};
> +
> +module_platform_driver(hip04_mdio_driver);
> +
> +MODULE_DESCRIPTION("HISILICON P04 MDIO interface driver");
> +MODULE_LICENSE("GPL v2");
> +MODULE_ALIAS("platform:hip04-mdio");
> --
> 1.7.9.5
>
>
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r@public.gmane.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel



-- 
Florian
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 148+ messages in thread

* [PATCH 2/3] net: hisilicon: new hip04 MDIO driver
@ 2014-03-18 17:28       ` Florian Fainelli
  0 siblings, 0 replies; 148+ messages in thread
From: Florian Fainelli @ 2014-03-18 17:28 UTC (permalink / raw)
  To: linux-arm-kernel

2014-03-18 1:40 GMT-07:00 Zhangfei Gao <zhangfei.gao@linaro.org>:
> Hisilicon hip04 platform mdio driver
> Reuse Marvell phy drivers/net/phy/marvell.c
>
> Signed-off-by: Zhangfei Gao <zhangfei.gao@linaro.org>
> ---
>  drivers/net/ethernet/Kconfig                |    1 +
>  drivers/net/ethernet/Makefile               |    1 +
>  drivers/net/ethernet/hisilicon/Kconfig      |   31 +++++
>  drivers/net/ethernet/hisilicon/Makefile     |    5 +
>  drivers/net/ethernet/hisilicon/hip04_mdio.c |  190 +++++++++++++++++++++++++++
>  5 files changed, 228 insertions(+)
>  create mode 100644 drivers/net/ethernet/hisilicon/Kconfig
>  create mode 100644 drivers/net/ethernet/hisilicon/Makefile
>  create mode 100644 drivers/net/ethernet/hisilicon/hip04_mdio.c
>
> diff --git a/drivers/net/ethernet/Kconfig b/drivers/net/ethernet/Kconfig
> index 39484b5..cef103d 100644
> --- a/drivers/net/ethernet/Kconfig
> +++ b/drivers/net/ethernet/Kconfig
> @@ -55,6 +55,7 @@ source "drivers/net/ethernet/neterion/Kconfig"
>  source "drivers/net/ethernet/faraday/Kconfig"
>  source "drivers/net/ethernet/freescale/Kconfig"
>  source "drivers/net/ethernet/fujitsu/Kconfig"
> +source "drivers/net/ethernet/hisilicon/Kconfig"
>  source "drivers/net/ethernet/hp/Kconfig"
>  source "drivers/net/ethernet/ibm/Kconfig"
>  source "drivers/net/ethernet/intel/Kconfig"
> diff --git a/drivers/net/ethernet/Makefile b/drivers/net/ethernet/Makefile
> index adf61af..f70b166 100644
> --- a/drivers/net/ethernet/Makefile
> +++ b/drivers/net/ethernet/Makefile
> @@ -30,6 +30,7 @@ obj-$(CONFIG_NET_VENDOR_EXAR) += neterion/
>  obj-$(CONFIG_NET_VENDOR_FARADAY) += faraday/
>  obj-$(CONFIG_NET_VENDOR_FREESCALE) += freescale/
>  obj-$(CONFIG_NET_VENDOR_FUJITSU) += fujitsu/
> +obj-$(CONFIG_NET_VENDOR_HISILICON) += hisilicon/
>  obj-$(CONFIG_NET_VENDOR_HP) += hp/
>  obj-$(CONFIG_NET_VENDOR_IBM) += ibm/
>  obj-$(CONFIG_NET_VENDOR_INTEL) += intel/
> diff --git a/drivers/net/ethernet/hisilicon/Kconfig b/drivers/net/ethernet/hisilicon/Kconfig
> new file mode 100644
> index 0000000..4b1c065
> --- /dev/null
> +++ b/drivers/net/ethernet/hisilicon/Kconfig
> @@ -0,0 +1,31 @@
> +#
> +# HISILICON device configuration
> +#
> +
> +config NET_VENDOR_HISILICON
> +       bool "Hisilicon devices"
> +       default y
> +       depends on ARM
> +       ---help---
> +         If you have a network (Ethernet) card belonging to this class, say Y
> +         and read the Ethernet-HOWTO, available from
> +         <http://www.tldp.org/docs.html#howto>.
> +
> +         Note that the answer to this question doesn't directly affect the
> +         kernel: saying N will just cause the configurator to skip all
> +         the questions about MOXA ART devices. If you say Y, you will be asked
> +         for your specific card in the following questions.
> +
> +if NET_VENDOR_HISILICON
> +
> +config HIP04_ETH
> +       tristate "HISILICON P04 Ethernet support"
> +       select NET_CORE
> +       select PHYLIB
> +       select MARVELL_PHY
> +       ---help---
> +         If you wish to compile a kernel for a hardware with hisilicon p04 SoC and
> +         want to use the internal ethernet then you should answer Y to this.
> +
> +
> +endif # NET_VENDOR_HISILICON
> diff --git a/drivers/net/ethernet/hisilicon/Makefile b/drivers/net/ethernet/hisilicon/Makefile
> new file mode 100644
> index 0000000..1d6eb6e
> --- /dev/null
> +++ b/drivers/net/ethernet/hisilicon/Makefile
> @@ -0,0 +1,5 @@
> +#
> +# Makefile for the HISILICON network device drivers.
> +#
> +
> +obj-$(CONFIG_HIP04_ETH) += hip04_mdio.o
> diff --git a/drivers/net/ethernet/hisilicon/hip04_mdio.c b/drivers/net/ethernet/hisilicon/hip04_mdio.c
> new file mode 100644
> index 0000000..960adc2
> --- /dev/null
> +++ b/drivers/net/ethernet/hisilicon/hip04_mdio.c
> @@ -0,0 +1,190 @@
> +
> +/* Copyright (c) 2014 Linaro Ltd.
> + * Copyright (c) 2014 Hisilicon Limited.
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License as published by
> + * the Free Software Foundation; either version 2 of the License, or
> + * (at your option) any later version.
> + */
> +
> +#include <linux/module.h>
> +#include <linux/platform_device.h>
> +#include <linux/io.h>
> +#include <linux/of_mdio.h>
> +#include <linux/delay.h>
> +
> +#define MDIO_CMD_REG           0x0
> +#define MDIO_ADDR_REG          0x4
> +#define MDIO_WDATA_REG         0x8
> +#define MDIO_RDATA_REG         0xc
> +#define MDIO_STA_REG           0x10
> +
> +#define MDIO_START             BIT(14)
> +#define MDIO_R_VALID           BIT(1)
> +#define MDIO_READ              (BIT(12) | BIT(11) | MDIO_START)
> +#define MDIO_WRITE             (BIT(12) | BIT(10) | MDIO_START)
> +
> +struct hip04_mdio_priv {
> +       void __iomem *base;
> +};
> +
> +#define WAIT_TIMEOUT 10
> +static int hip04_mdio_wait_ready(struct mii_bus *bus)
> +{
> +       struct hip04_mdio_priv *priv = bus->priv;
> +       int i;
> +
> +       for (i = 0; readl_relaxed(priv->base + MDIO_CMD_REG) & MDIO_START; i++) {
> +               if (i == WAIT_TIMEOUT)
> +                       return -ETIMEDOUT;
> +               msleep(20);
> +       }
> +
> +       return 0;
> +}
> +
> +static int hip04_mdio_read(struct mii_bus *bus, int mii_id, int regnum)
> +{
> +       struct hip04_mdio_priv *priv = bus->priv;
> +       u32 val;
> +       int ret;
> +
> +       ret = hip04_mdio_wait_ready(bus);
> +       if (ret < 0)
> +               goto out;
> +
> +       val = regnum | (mii_id << 5) | MDIO_READ;
> +       writel_relaxed(val, priv->base + MDIO_CMD_REG);
> +
> +       ret = hip04_mdio_wait_ready(bus);
> +       if (ret < 0)
> +               goto out;
> +       val = readl_relaxed(priv->base + MDIO_STA_REG);
> +       if (val & MDIO_R_VALID) {
> +               dev_err(bus->parent, "SMI bus read not valid\n");
> +               ret = -ENODEV;
> +               goto out;
> +       }
> +       val = readl_relaxed(priv->base + MDIO_RDATA_REG);
> +       ret = val & 0xFFFF;
> +out:
> +       return ret;
> +}
> +
> +static int hip04_mdio_write(struct mii_bus *bus, int mii_id,
> +                           int regnum, u16 value)
> +{
> +       struct hip04_mdio_priv *priv = bus->priv;
> +       u32 val;
> +       int ret;
> +
> +       ret = hip04_mdio_wait_ready(bus);
> +       if (ret < 0)
> +               goto out;
> +
> +       writel_relaxed(value, priv->base + MDIO_WDATA_REG);
> +       val = regnum | (mii_id << 5) | MDIO_WRITE;
> +       writel_relaxed(val, priv->base + MDIO_CMD_REG);
> +out:
> +       return ret;
> +}
> +
> +
> +static int hip04_mdio_reset(struct mii_bus *bus)
> +{
> +       int temp, err, i;
> +
> +       for (i = 0; i < 2; i++) {
> +               hip04_mdio_write(bus, i, 22, 0);
> +               temp = hip04_mdio_read(bus, i, MII_BMCR);
> +               temp |= BMCR_RESET;
> +               err = hip04_mdio_write(bus, i, MII_BMCR, temp);
> +               if (err < 0)
> +                       return err;
> +       }
> +
> +       mdelay(500);

This does not look correct, you should iterate over all possible PHYs:
PHY_MAX_ADDR instead of hardcoding the loop to 2.

I think we might want to remove the mdio bus reset callback in general
as the PHY library should already take care of software resetting the
PHY to put it in a sane state, as well as waiting for the appropriate
delay before using, unlike here, where you do not poll for BMCR_RESET
to be cleared by the PHY.

> +       return 0;
> +}
> +
> +static int hip04_mdio_probe(struct platform_device *pdev)
> +{
> +       struct resource *r;
> +       struct mii_bus *bus;
> +       struct hip04_mdio_priv *priv;
> +       int ret;
> +
> +       r = platform_get_resource(pdev, IORESOURCE_MEM, 0);
> +       if (!r) {
> +               dev_err(&pdev->dev, "No SMI register address given\n");
> +               return -ENODEV;
> +       }
> +
> +       bus = mdiobus_alloc_size(sizeof(struct hip04_mdio_priv));
> +       if (!bus) {
> +               dev_err(&pdev->dev, "Cannot allocate MDIO bus\n");
> +               return -ENOMEM;
> +       }
> +
> +       bus->name = "hip04_mdio_bus";
> +       bus->read = hip04_mdio_read;
> +       bus->write = hip04_mdio_write;
> +       bus->reset = hip04_mdio_reset;
> +       snprintf(bus->id, MII_BUS_ID_SIZE, "%s-mii",
> +                dev_name(&pdev->dev));
> +       bus->parent = &pdev->dev;
> +       priv = bus->priv;
> +       priv->base = devm_ioremap(&pdev->dev, r->start, resource_size(r));
> +       if (!priv->base) {
> +               dev_err(&pdev->dev, "Unable to remap SMI register\n");
> +               ret = -ENODEV;
> +               goto out_mdio;
> +       }
> +
> +       ret = of_mdiobus_register(bus, pdev->dev.of_node);
> +       if (ret < 0) {
> +               dev_err(&pdev->dev, "Cannot register MDIO bus (%d)\n", ret);
> +               goto out_mdio;
> +       }
> +
> +       platform_set_drvdata(pdev, bus);
> +
> +       return 0;
> +
> +out_mdio:
> +       mdiobus_free(bus);
> +       return ret;
> +}
> +
> +static int hip04_mdio_remove(struct platform_device *pdev)
> +{
> +       struct mii_bus *bus = platform_get_drvdata(pdev);
> +
> +       mdiobus_unregister(bus);
> +       mdiobus_free(bus);
> +
> +       return 0;
> +}
> +
> +static const struct of_device_id hip04_mdio_match[] = {
> +       { .compatible = "hisilicon,hip04-mdio" },
> +       { }
> +};
> +MODULE_DEVICE_TABLE(of, hip04_mdio_match);
> +
> +static struct platform_driver hip04_mdio_driver = {
> +       .probe = hip04_mdio_probe,
> +       .remove = hip04_mdio_remove,
> +       .driver = {
> +               .name = "hip04-mdio",
> +               .owner = THIS_MODULE,
> +               .of_match_table = hip04_mdio_match,
> +       },
> +};
> +
> +module_platform_driver(hip04_mdio_driver);
> +
> +MODULE_DESCRIPTION("HISILICON P04 MDIO interface driver");
> +MODULE_LICENSE("GPL v2");
> +MODULE_ALIAS("platform:hip04-mdio");
> --
> 1.7.9.5
>
>
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel at lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel



-- 
Florian

^ permalink raw reply	[flat|nested] 148+ messages in thread

* Re: [PATCH 1/3] Documentation: add Device tree bindings for Hisilicon hip04 ethernet
  2014-03-18  8:40   ` Zhangfei Gao
@ 2014-03-18 17:39     ` Florian Fainelli
  -1 siblings, 0 replies; 148+ messages in thread
From: Florian Fainelli @ 2014-03-18 17:39 UTC (permalink / raw)
  To: Zhangfei Gao; +Cc: David S. Miller, netdev, linux-arm-kernel, devicetree

2014-03-18 1:40 GMT-07:00 Zhangfei Gao <zhangfei.gao@linaro.org>:
> This patch adds the Device Tree bindings for the Hisilicon hip04
> Ethernet controller, including 100M / 1000M controller.
>
> Signed-off-by: Zhangfei Gao <zhangfei.gao@linaro.org>
> ---
>  .../bindings/net/hisilicon-hip04-net.txt           |   74 ++++++++++++++++++++
>  1 file changed, 74 insertions(+)
>  create mode 100644 Documentation/devicetree/bindings/net/hisilicon-hip04-net.txt
>
> diff --git a/Documentation/devicetree/bindings/net/hisilicon-hip04-net.txt b/Documentation/devicetree/bindings/net/hisilicon-hip04-net.txt
> new file mode 100644
> index 0000000..c918f08
> --- /dev/null
> +++ b/Documentation/devicetree/bindings/net/hisilicon-hip04-net.txt
> @@ -0,0 +1,74 @@
> +Hisilicon hip04 Ethernet Controller
> +
> +* Ethernet controller node
> +
> +Required properties:
> +- compatible: should be "hisilicon,hip04-mac".
> +- reg: address and length of the register set for the device.
> +- interrupts: interrupt for the device.
> +- port: ppe port number connected to the controller: range from 0 to 31.
> +- speed: 100 (100M) or 1000 (1000M).
> +- id: should be different and fe should be 0.
> +
> +Optional Properties:
> +- phy-handle : the phandle to a PHY node
> +
> +
> +* Ethernet ppe node: control rx & tx fifos of all ethernet controllers
> +
> +Required properties:
> +- compatible: should be "hisilicon,hip04-ppebase".
> +- reg: address and length of the register set for the node.
> +
> +
> +* MDIO bus node:
> +
> +Required properties:
> +
> +- compatible: "hisilicon,hip04-mdio"
> +- Inherets from MDIO bus node binding[1]
> +[1] Documentation/devicetree/bindings/net/phy.txt
> +
> +Example:
> +       mdio {
> +               compatible = "hisilicon,hip04-mdio";
> +               reg = <0x28f1000 0x1000>;
> +               #address-cells = <1>;
> +               #size-cells = <0>;
> +
> +               phy0: ethernet-phy@0 {
> +                       reg = <0>;
> +                       marvell,reg-init = <18 0x14 0 0x8001>;
> +                       device_type = "ethernet-phy";

You are missing a compatible string such as
"ethernet-phy-ieee802.3-c22", please take a look at
Documentation/devicetree/bindings/net/phy.txt for an example.

device_type is deprecated and should be removed.

> +               };
> +
> +               phy1: ethernet-phy@1 {
> +                       reg = <1>;
> +                       marvell,reg-init = <18 0x14 0 0x8001>;
> +                       device_type = "ethernet-phy";
> +               };
> +       };
> +
> +       ppebase: ppebase@28c0000 {
> +               compatible = "hisilicon,hip04-ppebase";
> +               reg = <0x28c0000 0x10000>;

This should probably look like:

                    #address-cells = <0>;
                    #size-cells = <0>;

                    eth0_port: port@0 {
                         reg = <0>;
                    };

                    eth1_port: port@1f {
                         reg = <31>;
                    };

This looks like something similar to mv643xx_eth, you should see
Documentation/devicetree/bindings/marvell.txt for hints on how to
model the representation in a similar fashion.

> +       };
> +
> +       fe: ethernet@28b0000 {
> +               compatible = "hisilicon,hip04-mac";
> +               reg = <0x28b0000 0x10000>;
> +               interrupts = <0 413 4>;
> +               port = <31>;

I do not think this is the right way to expose that, port should be
specialized to e.g: hisilicon,port, or you should use a phandle to the
"ppebase" node which exposes differents ports as subnodes:

                    hisilicon,port-handle = <&eth0_port>;

> +               speed = <100>;

max-speed is the standard property for this

> +               id = <0>;

id here is a software concept, either you create properly numbered
aliases for these nodes, and use of_alias_get_id(), or you do not use
these identifiers at all.

> +       };
> +
> +       ge0: ethernet@2800000 {
> +               compatible = "hisilicon,hip04-mac";
> +               reg = <0x2800000 0x10000>;
> +               interrupts = <0 402 4>;
> +               port = <0>;
> +               speed = <1000>;
> +               id = <1>;
> +               phy-handle = <&phy0>;
> +       };
> --
> 1.7.9.5
>
>
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel



-- 
Florian

^ permalink raw reply	[flat|nested] 148+ messages in thread

* [PATCH 1/3] Documentation: add Device tree bindings for Hisilicon hip04 ethernet
@ 2014-03-18 17:39     ` Florian Fainelli
  0 siblings, 0 replies; 148+ messages in thread
From: Florian Fainelli @ 2014-03-18 17:39 UTC (permalink / raw)
  To: linux-arm-kernel

2014-03-18 1:40 GMT-07:00 Zhangfei Gao <zhangfei.gao@linaro.org>:
> This patch adds the Device Tree bindings for the Hisilicon hip04
> Ethernet controller, including 100M / 1000M controller.
>
> Signed-off-by: Zhangfei Gao <zhangfei.gao@linaro.org>
> ---
>  .../bindings/net/hisilicon-hip04-net.txt           |   74 ++++++++++++++++++++
>  1 file changed, 74 insertions(+)
>  create mode 100644 Documentation/devicetree/bindings/net/hisilicon-hip04-net.txt
>
> diff --git a/Documentation/devicetree/bindings/net/hisilicon-hip04-net.txt b/Documentation/devicetree/bindings/net/hisilicon-hip04-net.txt
> new file mode 100644
> index 0000000..c918f08
> --- /dev/null
> +++ b/Documentation/devicetree/bindings/net/hisilicon-hip04-net.txt
> @@ -0,0 +1,74 @@
> +Hisilicon hip04 Ethernet Controller
> +
> +* Ethernet controller node
> +
> +Required properties:
> +- compatible: should be "hisilicon,hip04-mac".
> +- reg: address and length of the register set for the device.
> +- interrupts: interrupt for the device.
> +- port: ppe port number connected to the controller: range from 0 to 31.
> +- speed: 100 (100M) or 1000 (1000M).
> +- id: should be different and fe should be 0.
> +
> +Optional Properties:
> +- phy-handle : the phandle to a PHY node
> +
> +
> +* Ethernet ppe node: control rx & tx fifos of all ethernet controllers
> +
> +Required properties:
> +- compatible: should be "hisilicon,hip04-ppebase".
> +- reg: address and length of the register set for the node.
> +
> +
> +* MDIO bus node:
> +
> +Required properties:
> +
> +- compatible: "hisilicon,hip04-mdio"
> +- Inherets from MDIO bus node binding[1]
> +[1] Documentation/devicetree/bindings/net/phy.txt
> +
> +Example:
> +       mdio {
> +               compatible = "hisilicon,hip04-mdio";
> +               reg = <0x28f1000 0x1000>;
> +               #address-cells = <1>;
> +               #size-cells = <0>;
> +
> +               phy0: ethernet-phy at 0 {
> +                       reg = <0>;
> +                       marvell,reg-init = <18 0x14 0 0x8001>;
> +                       device_type = "ethernet-phy";

You are missing a compatible string such as
"ethernet-phy-ieee802.3-c22", please take a look at
Documentation/devicetree/bindings/net/phy.txt for an example.

device_type is deprecated and should be removed.

> +               };
> +
> +               phy1: ethernet-phy at 1 {
> +                       reg = <1>;
> +                       marvell,reg-init = <18 0x14 0 0x8001>;
> +                       device_type = "ethernet-phy";
> +               };
> +       };
> +
> +       ppebase: ppebase at 28c0000 {
> +               compatible = "hisilicon,hip04-ppebase";
> +               reg = <0x28c0000 0x10000>;

This should probably look like:

                    #address-cells = <0>;
                    #size-cells = <0>;

                    eth0_port: port at 0 {
                         reg = <0>;
                    };

                    eth1_port: port at 1f {
                         reg = <31>;
                    };

This looks like something similar to mv643xx_eth, you should see
Documentation/devicetree/bindings/marvell.txt for hints on how to
model the representation in a similar fashion.

> +       };
> +
> +       fe: ethernet at 28b0000 {
> +               compatible = "hisilicon,hip04-mac";
> +               reg = <0x28b0000 0x10000>;
> +               interrupts = <0 413 4>;
> +               port = <31>;

I do not think this is the right way to expose that, port should be
specialized to e.g: hisilicon,port, or you should use a phandle to the
"ppebase" node which exposes differents ports as subnodes:

                    hisilicon,port-handle = <&eth0_port>;

> +               speed = <100>;

max-speed is the standard property for this

> +               id = <0>;

id here is a software concept, either you create properly numbered
aliases for these nodes, and use of_alias_get_id(), or you do not use
these identifiers at all.

> +       };
> +
> +       ge0: ethernet at 2800000 {
> +               compatible = "hisilicon,hip04-mac";
> +               reg = <0x2800000 0x10000>;
> +               interrupts = <0 402 4>;
> +               port = <0>;
> +               speed = <1000>;
> +               id = <1>;
> +               phy-handle = <&phy0>;
> +       };
> --
> 1.7.9.5
>
>
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel at lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel



-- 
Florian

^ permalink raw reply	[flat|nested] 148+ messages in thread

* Re: [PATCH 3/3] net: hisilicon: new hip04 ethernet driver
  2014-03-18 10:46         ` Russell King - ARM Linux
@ 2014-03-20  9:51           ` Zhangfei Gao
  -1 siblings, 0 replies; 148+ messages in thread
From: Zhangfei Gao @ 2014-03-20  9:51 UTC (permalink / raw)
  To: Russell King - ARM Linux
  Cc: Zhangfei Gao, netdev, David S. Miller, linux-arm-kernel, devicetree

Dear Russell

Thanks for sparing time and giving so many perfect suggestion, really helpful.

On Tue, Mar 18, 2014 at 6:46 PM, Russell King - ARM Linux
<linux@arm.linux.org.uk> wrote:
> I was just browsing this patch when I noticed some of these issues - I
> haven't done a full review of this driver, I'm just commenting on the
> things I've spotted.
>
> On Tue, Mar 18, 2014 at 04:40:17PM +0800, Zhangfei Gao wrote:
>> +static int hip04_rx_poll(struct napi_struct *napi, int budget)
>> +{
>> +     struct hip04_priv *priv = container_of(napi,
>> +                           struct hip04_priv, napi);
>> +     struct net_device *ndev = priv->ndev;
>> +     struct sk_buff *skb;
>> +     struct rx_desc *desc;
>> +     unsigned char *buf;
>> +     int rx = 0;
>> +     unsigned int cnt = hip04_recv_cnt(priv);
>> +     unsigned int len, tmp[16];
>> +
>> +     while (cnt) {
>> +             buf = priv->rx_buf[priv->rx_head];
>> +             skb = build_skb(buf, priv->rx_buf_size);
>> +             if (unlikely(!skb))
>> +                     net_dbg_ratelimited("build_skb failed\n");
>> +             dma_map_single(&ndev->dev, skb->data,
>> +                     RX_BUF_SIZE, DMA_FROM_DEVICE);
>
> This is incorrect.
>
> buf = buffer alloc()
> /* CPU owns buffer and can read/write it, device does not */
> dev_addr = dma_map_single(dev, buf, ..., DMA_FROM_DEVICE);
> /* Device owns buffer and can write it, CPU does not access it */
> dma_unmap_single(dev, dev_addr, ..., DMA_FROM_DEVICE);
> /* CPU owns buffer again and can read/write it, device does not */
>
> Please turn on DMA API debugging in the kernel debug options and verify
> whether your driver causes it to complain (it will.)

Yes, you are right.
After change to dma_map/unmap_single, however, still get warning like
"DMA-API: device driver failed to check map error", not sure whether
it can be ignored?

>
> I think you want dma_unmap_single() here.
>
>> +             memcpy(tmp, skb->data, 64);
>> +             endian_change((void *)tmp, 64);
>> +             desc = (struct rx_desc *)tmp;
>> +             len = desc->pkt_len;
>
> This is a rather expensive way to do this.  Presumably the descriptors
> are always big endian?  If so, why not:
>
>                 desc = skb->data;
>                 len = be16_to_cpu(desc->pkt_len);
>
> ?  You may need to lay the struct out differently for this to work so
> the offset which pkt_len accesses is correct.

Great, it is what I am looking for.
Not thought of changing layout of struct before.

>
> Also... do you not have any flags which indicate whether the packet
> received was in error?
>
>> +
>> +             if (len > RX_BUF_SIZE)
>> +                     len = RX_BUF_SIZE;
>> +             if (0 == len)
>> +                     break;
>> +
>> +             skb_reserve(skb, NET_SKB_PAD + NET_IP_ALIGN);
>> +             skb_put(skb, len);
>> +             skb->protocol = eth_type_trans(skb, ndev);
>> +             napi_gro_receive(&priv->napi, skb);
>> +
>> +             buf = netdev_alloc_frag(priv->rx_buf_size);
>> +             if (!buf)
>> +                     return -ENOMEM;
>> +             priv->rx_buf[priv->rx_head] = buf;
>> +             dma_map_single(&ndev->dev, buf, RX_BUF_SIZE, DMA_TO_DEVICE);
>> +             hip04_set_recv_desc(priv, virt_to_phys(buf));
>
> No need for virt_to_phys() here - dma_map_single() returns the device
> address.
Got it.
Use virt_to_phys since find same result come out, it should be
different for iommu case.

In fact, the hardware can help to do the cache flushing, the function
still not be enabled now.
Then dma_map/unmap_single may be ignored.

>
>> +
>> +             priv->rx_head = RX_NEXT(priv->rx_head);
>> +             if (rx++ >= budget)
>> +                     break;
>> +
>> +             if (--cnt == 0)
>> +                     cnt = hip04_recv_cnt(priv);
>> +     }
>> +
>> +     if (rx < budget) {
>> +             napi_gro_flush(napi, false);
>> +             __napi_complete(napi);
>> +     }
>> +
>> +     /* enable rx interrupt */
>> +     priv->reg_inten |= RCV_INT | RCV_NOBUF;
>> +     writel_relaxed(priv->reg_inten, priv->base + PPE_INTEN);
>
> This doesn't look right - you're supposed to re-enable receive interrupts
> when you receive less than "budget" packets.
Yes, got it.

>
>> +static irqreturn_t hip04_mac_interrupt(int irq, void *dev_id)
>> +{
>> +     struct net_device *ndev = (struct net_device *) dev_id;
>> +     struct hip04_priv *priv = netdev_priv(ndev);
>> +     u32 ists = readl_relaxed(priv->base + PPE_INTSTS);
>> +     u32 val = DEF_INT_MASK;
>> +
>> +     writel_relaxed(val, priv->base + PPE_RINT);
>> +
>> +     if ((ists & RCV_INT) || (ists & RCV_NOBUF)) {
>
> What you get with this is the compiler generating code to test RCV_INT,
> and then if that's false, code to test RCV_NOBUF.  There's no possibility
> for the compiler to optimise that because it's part of the language spec
> that condition1 || condition2 will always have condition1 evaluated first,
> and condition2 will only be evaluated if condition1 was false.
>
>         if (ists & (RCV_INT | RCV_NOBUF)) {
>
> would more than likely be more efficient here.

Cool, never think about this optimization.

>
>> +             if (napi_schedule_prep(&priv->napi)) {
>> +                     /* disable rx interrupt */
>> +                     priv->reg_inten &= ~(RCV_INT | RCV_NOBUF);
>> +                     writel_relaxed(priv->reg_inten, priv->base + PPE_INTEN);
>> +                     __napi_schedule(&priv->napi);
>> +             }
>> +     }
>> +
>> +     return IRQ_HANDLED;
>> +}
>> +
>> +static void hip04_tx_reclaim(struct net_device *ndev, bool force)
>> +{
>> +     struct hip04_priv *priv = netdev_priv(ndev);
>> +     unsigned tx_head = priv->tx_head;
>> +     unsigned tx_tail = priv->tx_tail;
>> +     struct tx_desc *desc = priv->td_ring[priv->tx_tail];
>> +
>> +     spin_lock_irq(&priv->txlock);
>
> Do you know for certain that interrupts were (and always will be) definitely
> enabled prior to this point?  If not, you should use spin_lock_irqsave()..
> spin_unlock_irqrestore().
Yes,
After double check, I found spin_lock can be ignored at all here,
since xmit is protected by rcu.

>
>> +     while (tx_tail != tx_head) {
>> +             if (desc->send_addr != 0) {
>> +                     if (force)
>> +                             desc->send_addr = 0;
>> +                     else
>> +                             break;
>> +             }
>
> dma_unmap_single(&ndev->dev, dev_addr, skb->len, DMA_TO_DEVICE) ?
>
> It looks like your device zeros the send address when it has finished
> transmitting - if this is true, then you will need to store dev_addr
> separately for each transmit packet.

Yes, dev_addr is cleared after transmission over, and should be stored
for unmap.

>
>> +             dev_kfree_skb_irq(priv->tx_skb[tx_tail]);
>> +             priv->tx_skb[tx_tail] = NULL;
>> +             tx_tail = TX_NEXT(tx_tail);
>> +             priv->tx_count--;
>
> No processing of transmit statistics?
Will add it.

>
>> +     }
>> +     priv->tx_tail = tx_tail;
>> +     spin_unlock_irq(&priv->txlock);
>
> If you have freed up any packets, then you should call netif_wake_queue().
> Do you not get any interrupts when a packet is transmitted?

Unfortunately, there is no interrupt after packet is transmitted.
Reclaim only in xmit itself, so it may no need to
netif_wake_queue/netif_stop_queue.

>
>> +}
>> +
>> +static int hip04_mac_start_xmit(struct sk_buff *skb, struct net_device *ndev)
>> +{
>> +     struct hip04_priv *priv = netdev_priv(ndev);
>> +     struct tx_desc *desc = priv->td_ring[priv->tx_head];
>> +     unsigned int tx_head = priv->tx_head;
>> +     int ret;
>> +
>> +     hip04_tx_reclaim(ndev, false);
>> +
>> +     spin_lock_irq(&priv->txlock);
>
> Same comment here...
>
>> +     if (priv->tx_count++ >= TX_DESC_NUM) {
>> +             net_dbg_ratelimited("no TX space for packet\n");
>> +             netif_stop_queue(ndev);
>> +             ret = NETDEV_TX_BUSY;
>> +             goto out_unlock;
>> +     }
>
> You shouldn't rely on this - you should stop the queue when you put the
> last packet to fill the ring before returning from this function.  When
> you clean the ring in your hip04_tx_reclaim() function, to wake the
> queue.
Since no finished interrupt to trigger again, and to wake the queue,
so only check in xmit.

>
>> +
>> +     priv->tx_skb[tx_head] = skb;
>> +     dma_map_single(&ndev->dev, skb->data, skb->len, DMA_TO_DEVICE);
>> +     memset((void *)desc, 0, sizeof(*desc));
>> +     desc->send_addr = (unsigned int)virt_to_phys(skb->data);
>
> Again, dma_map_single() gives you the device address, there's no need
> to use virt_to_phys(), and there should be no need for a cast here
> either.  Also consider cpu_to_be32() and similar for the other descriptor
> writes.
Yes, cpu_to_be32 can do save the memcpy.

>
>> +     desc->send_size = skb->len;
>> +     desc->cfg = DESC_DEF_CFG;
>> +     desc->wb_addr = priv->td_phys[tx_head];
>> +     endian_change(desc, 64);
>> +     skb_tx_timestamp(skb);
>> +     hip04_set_xmit_desc(priv, priv->td_phys[tx_head]);
>> +
>> +     priv->tx_head = TX_NEXT(tx_head);
>> +     ret = NETDEV_TX_OK;
>
> As mentioned above, if you have filled the ring, you need to also call
> netif_stop_queue() here.
For rx, the basic idea is always have RX_DESC_NUM buffers in the pool.
When buffer is used, immediately alloc a new one and add to the end of pool.

>
>> +static int hip04_mac_probe(struct platform_device *pdev)
>> +{
>> +     struct device *d = &pdev->dev;
>> +     struct device_node *node = d->of_node;
>> +     struct net_device *ndev;
>> +     struct hip04_priv *priv;
>> +     struct resource *res;
>> +     unsigned int irq, val;
>> +     int ret;
>> +
>> +     ndev = alloc_etherdev(sizeof(struct hip04_priv));
>> +     if (!ndev)
>> +             return -ENOMEM;
>> +
>> +     priv = netdev_priv(ndev);
>> +     priv->ndev = ndev;
>> +     platform_set_drvdata(pdev, ndev);
>> +     spin_lock_init(&priv->txlock);
>> +     res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
>> +     if (!res) {
>> +             ret = -EINVAL;
>> +             goto init_fail;
>> +     }
>> +     ndev->base_addr = res->start;
>> +     priv->base = devm_ioremap_resource(d, res);
>> +     ret = IS_ERR(priv->base);
>> +     if (ret) {
>> +             dev_err(d, "devm_ioremap_resource failed\n");
>> +             goto init_fail;
>> +     }
>
> If you're using devm_ioremap_resource(), you don't need to check the
> resource above.  In any case, returning the value from IS_ERR() from
> this function is not correct.

Got it, good idea.

>
>         res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
>         priv->base = devm_ioremap_resource(d, res);
>         if (IS_ERR(priv->base) {
>                 ret = PTR_ERR(priv->base);
>                 goto init_fail;
>         }
>
> You don't need to fill in ndev->base_addr (many drivers don't.)
OK, got it.

Thanks

^ permalink raw reply	[flat|nested] 148+ messages in thread

* [PATCH 3/3] net: hisilicon: new hip04 ethernet driver
@ 2014-03-20  9:51           ` Zhangfei Gao
  0 siblings, 0 replies; 148+ messages in thread
From: Zhangfei Gao @ 2014-03-20  9:51 UTC (permalink / raw)
  To: linux-arm-kernel

Dear Russell

Thanks for sparing time and giving so many perfect suggestion, really helpful.

On Tue, Mar 18, 2014 at 6:46 PM, Russell King - ARM Linux
<linux@arm.linux.org.uk> wrote:
> I was just browsing this patch when I noticed some of these issues - I
> haven't done a full review of this driver, I'm just commenting on the
> things I've spotted.
>
> On Tue, Mar 18, 2014 at 04:40:17PM +0800, Zhangfei Gao wrote:
>> +static int hip04_rx_poll(struct napi_struct *napi, int budget)
>> +{
>> +     struct hip04_priv *priv = container_of(napi,
>> +                           struct hip04_priv, napi);
>> +     struct net_device *ndev = priv->ndev;
>> +     struct sk_buff *skb;
>> +     struct rx_desc *desc;
>> +     unsigned char *buf;
>> +     int rx = 0;
>> +     unsigned int cnt = hip04_recv_cnt(priv);
>> +     unsigned int len, tmp[16];
>> +
>> +     while (cnt) {
>> +             buf = priv->rx_buf[priv->rx_head];
>> +             skb = build_skb(buf, priv->rx_buf_size);
>> +             if (unlikely(!skb))
>> +                     net_dbg_ratelimited("build_skb failed\n");
>> +             dma_map_single(&ndev->dev, skb->data,
>> +                     RX_BUF_SIZE, DMA_FROM_DEVICE);
>
> This is incorrect.
>
> buf = buffer alloc()
> /* CPU owns buffer and can read/write it, device does not */
> dev_addr = dma_map_single(dev, buf, ..., DMA_FROM_DEVICE);
> /* Device owns buffer and can write it, CPU does not access it */
> dma_unmap_single(dev, dev_addr, ..., DMA_FROM_DEVICE);
> /* CPU owns buffer again and can read/write it, device does not */
>
> Please turn on DMA API debugging in the kernel debug options and verify
> whether your driver causes it to complain (it will.)

Yes, you are right.
After change to dma_map/unmap_single, however, still get warning like
"DMA-API: device driver failed to check map error", not sure whether
it can be ignored?

>
> I think you want dma_unmap_single() here.
>
>> +             memcpy(tmp, skb->data, 64);
>> +             endian_change((void *)tmp, 64);
>> +             desc = (struct rx_desc *)tmp;
>> +             len = desc->pkt_len;
>
> This is a rather expensive way to do this.  Presumably the descriptors
> are always big endian?  If so, why not:
>
>                 desc = skb->data;
>                 len = be16_to_cpu(desc->pkt_len);
>
> ?  You may need to lay the struct out differently for this to work so
> the offset which pkt_len accesses is correct.

Great, it is what I am looking for.
Not thought of changing layout of struct before.

>
> Also... do you not have any flags which indicate whether the packet
> received was in error?
>
>> +
>> +             if (len > RX_BUF_SIZE)
>> +                     len = RX_BUF_SIZE;
>> +             if (0 == len)
>> +                     break;
>> +
>> +             skb_reserve(skb, NET_SKB_PAD + NET_IP_ALIGN);
>> +             skb_put(skb, len);
>> +             skb->protocol = eth_type_trans(skb, ndev);
>> +             napi_gro_receive(&priv->napi, skb);
>> +
>> +             buf = netdev_alloc_frag(priv->rx_buf_size);
>> +             if (!buf)
>> +                     return -ENOMEM;
>> +             priv->rx_buf[priv->rx_head] = buf;
>> +             dma_map_single(&ndev->dev, buf, RX_BUF_SIZE, DMA_TO_DEVICE);
>> +             hip04_set_recv_desc(priv, virt_to_phys(buf));
>
> No need for virt_to_phys() here - dma_map_single() returns the device
> address.
Got it.
Use virt_to_phys since find same result come out, it should be
different for iommu case.

In fact, the hardware can help to do the cache flushing, the function
still not be enabled now.
Then dma_map/unmap_single may be ignored.

>
>> +
>> +             priv->rx_head = RX_NEXT(priv->rx_head);
>> +             if (rx++ >= budget)
>> +                     break;
>> +
>> +             if (--cnt == 0)
>> +                     cnt = hip04_recv_cnt(priv);
>> +     }
>> +
>> +     if (rx < budget) {
>> +             napi_gro_flush(napi, false);
>> +             __napi_complete(napi);
>> +     }
>> +
>> +     /* enable rx interrupt */
>> +     priv->reg_inten |= RCV_INT | RCV_NOBUF;
>> +     writel_relaxed(priv->reg_inten, priv->base + PPE_INTEN);
>
> This doesn't look right - you're supposed to re-enable receive interrupts
> when you receive less than "budget" packets.
Yes, got it.

>
>> +static irqreturn_t hip04_mac_interrupt(int irq, void *dev_id)
>> +{
>> +     struct net_device *ndev = (struct net_device *) dev_id;
>> +     struct hip04_priv *priv = netdev_priv(ndev);
>> +     u32 ists = readl_relaxed(priv->base + PPE_INTSTS);
>> +     u32 val = DEF_INT_MASK;
>> +
>> +     writel_relaxed(val, priv->base + PPE_RINT);
>> +
>> +     if ((ists & RCV_INT) || (ists & RCV_NOBUF)) {
>
> What you get with this is the compiler generating code to test RCV_INT,
> and then if that's false, code to test RCV_NOBUF.  There's no possibility
> for the compiler to optimise that because it's part of the language spec
> that condition1 || condition2 will always have condition1 evaluated first,
> and condition2 will only be evaluated if condition1 was false.
>
>         if (ists & (RCV_INT | RCV_NOBUF)) {
>
> would more than likely be more efficient here.

Cool, never think about this optimization.

>
>> +             if (napi_schedule_prep(&priv->napi)) {
>> +                     /* disable rx interrupt */
>> +                     priv->reg_inten &= ~(RCV_INT | RCV_NOBUF);
>> +                     writel_relaxed(priv->reg_inten, priv->base + PPE_INTEN);
>> +                     __napi_schedule(&priv->napi);
>> +             }
>> +     }
>> +
>> +     return IRQ_HANDLED;
>> +}
>> +
>> +static void hip04_tx_reclaim(struct net_device *ndev, bool force)
>> +{
>> +     struct hip04_priv *priv = netdev_priv(ndev);
>> +     unsigned tx_head = priv->tx_head;
>> +     unsigned tx_tail = priv->tx_tail;
>> +     struct tx_desc *desc = priv->td_ring[priv->tx_tail];
>> +
>> +     spin_lock_irq(&priv->txlock);
>
> Do you know for certain that interrupts were (and always will be) definitely
> enabled prior to this point?  If not, you should use spin_lock_irqsave()..
> spin_unlock_irqrestore().
Yes,
After double check, I found spin_lock can be ignored at all here,
since xmit is protected by rcu.

>
>> +     while (tx_tail != tx_head) {
>> +             if (desc->send_addr != 0) {
>> +                     if (force)
>> +                             desc->send_addr = 0;
>> +                     else
>> +                             break;
>> +             }
>
> dma_unmap_single(&ndev->dev, dev_addr, skb->len, DMA_TO_DEVICE) ?
>
> It looks like your device zeros the send address when it has finished
> transmitting - if this is true, then you will need to store dev_addr
> separately for each transmit packet.

Yes, dev_addr is cleared after transmission over, and should be stored
for unmap.

>
>> +             dev_kfree_skb_irq(priv->tx_skb[tx_tail]);
>> +             priv->tx_skb[tx_tail] = NULL;
>> +             tx_tail = TX_NEXT(tx_tail);
>> +             priv->tx_count--;
>
> No processing of transmit statistics?
Will add it.

>
>> +     }
>> +     priv->tx_tail = tx_tail;
>> +     spin_unlock_irq(&priv->txlock);
>
> If you have freed up any packets, then you should call netif_wake_queue().
> Do you not get any interrupts when a packet is transmitted?

Unfortunately, there is no interrupt after packet is transmitted.
Reclaim only in xmit itself, so it may no need to
netif_wake_queue/netif_stop_queue.

>
>> +}
>> +
>> +static int hip04_mac_start_xmit(struct sk_buff *skb, struct net_device *ndev)
>> +{
>> +     struct hip04_priv *priv = netdev_priv(ndev);
>> +     struct tx_desc *desc = priv->td_ring[priv->tx_head];
>> +     unsigned int tx_head = priv->tx_head;
>> +     int ret;
>> +
>> +     hip04_tx_reclaim(ndev, false);
>> +
>> +     spin_lock_irq(&priv->txlock);
>
> Same comment here...
>
>> +     if (priv->tx_count++ >= TX_DESC_NUM) {
>> +             net_dbg_ratelimited("no TX space for packet\n");
>> +             netif_stop_queue(ndev);
>> +             ret = NETDEV_TX_BUSY;
>> +             goto out_unlock;
>> +     }
>
> You shouldn't rely on this - you should stop the queue when you put the
> last packet to fill the ring before returning from this function.  When
> you clean the ring in your hip04_tx_reclaim() function, to wake the
> queue.
Since no finished interrupt to trigger again, and to wake the queue,
so only check in xmit.

>
>> +
>> +     priv->tx_skb[tx_head] = skb;
>> +     dma_map_single(&ndev->dev, skb->data, skb->len, DMA_TO_DEVICE);
>> +     memset((void *)desc, 0, sizeof(*desc));
>> +     desc->send_addr = (unsigned int)virt_to_phys(skb->data);
>
> Again, dma_map_single() gives you the device address, there's no need
> to use virt_to_phys(), and there should be no need for a cast here
> either.  Also consider cpu_to_be32() and similar for the other descriptor
> writes.
Yes, cpu_to_be32 can do save the memcpy.

>
>> +     desc->send_size = skb->len;
>> +     desc->cfg = DESC_DEF_CFG;
>> +     desc->wb_addr = priv->td_phys[tx_head];
>> +     endian_change(desc, 64);
>> +     skb_tx_timestamp(skb);
>> +     hip04_set_xmit_desc(priv, priv->td_phys[tx_head]);
>> +
>> +     priv->tx_head = TX_NEXT(tx_head);
>> +     ret = NETDEV_TX_OK;
>
> As mentioned above, if you have filled the ring, you need to also call
> netif_stop_queue() here.
For rx, the basic idea is always have RX_DESC_NUM buffers in the pool.
When buffer is used, immediately alloc a new one and add to the end of pool.

>
>> +static int hip04_mac_probe(struct platform_device *pdev)
>> +{
>> +     struct device *d = &pdev->dev;
>> +     struct device_node *node = d->of_node;
>> +     struct net_device *ndev;
>> +     struct hip04_priv *priv;
>> +     struct resource *res;
>> +     unsigned int irq, val;
>> +     int ret;
>> +
>> +     ndev = alloc_etherdev(sizeof(struct hip04_priv));
>> +     if (!ndev)
>> +             return -ENOMEM;
>> +
>> +     priv = netdev_priv(ndev);
>> +     priv->ndev = ndev;
>> +     platform_set_drvdata(pdev, ndev);
>> +     spin_lock_init(&priv->txlock);
>> +     res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
>> +     if (!res) {
>> +             ret = -EINVAL;
>> +             goto init_fail;
>> +     }
>> +     ndev->base_addr = res->start;
>> +     priv->base = devm_ioremap_resource(d, res);
>> +     ret = IS_ERR(priv->base);
>> +     if (ret) {
>> +             dev_err(d, "devm_ioremap_resource failed\n");
>> +             goto init_fail;
>> +     }
>
> If you're using devm_ioremap_resource(), you don't need to check the
> resource above.  In any case, returning the value from IS_ERR() from
> this function is not correct.

Got it, good idea.

>
>         res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
>         priv->base = devm_ioremap_resource(d, res);
>         if (IS_ERR(priv->base) {
>                 ret = PTR_ERR(priv->base);
>                 goto init_fail;
>         }
>
> You don't need to fill in ndev->base_addr (many drivers don't.)
OK, got it.

Thanks

^ permalink raw reply	[flat|nested] 148+ messages in thread

* Re: [PATCH 2/3] net: hisilicon: new hip04 MDIO driver
  2014-03-18 17:28       ` Florian Fainelli
@ 2014-03-20 10:53         ` Zhangfei Gao
  -1 siblings, 0 replies; 148+ messages in thread
From: Zhangfei Gao @ 2014-03-20 10:53 UTC (permalink / raw)
  To: Florian Fainelli
  Cc: Zhangfei Gao, netdev, David S. Miller, linux-arm-kernel, devicetree

Dear Florian

On Wed, Mar 19, 2014 at 1:28 AM, Florian Fainelli <f.fainelli@gmail.com> wrote:
> 2014-03-18 1:40 GMT-07:00 Zhangfei Gao <zhangfei.gao@linaro.org>:

>> +static int hip04_mdio_reset(struct mii_bus *bus)
>> +{
>> +       int temp, err, i;
>> +
>> +       for (i = 0; i < 2; i++) {
>> +               hip04_mdio_write(bus, i, 22, 0);
>> +               temp = hip04_mdio_read(bus, i, MII_BMCR);
>> +               temp |= BMCR_RESET;
>> +               err = hip04_mdio_write(bus, i, MII_BMCR, temp);
>> +               if (err < 0)
>> +                       return err;
>> +       }
>> +
>> +       mdelay(500);
>
> This does not look correct, you should iterate over all possible PHYs:
> PHY_MAX_ADDR instead of hardcoding the loop to 2.

OK, got it.
Use 2 is since only have 2 phy in the board, will use PHY_MAX_ADDR instead.
>
> I think we might want to remove the mdio bus reset callback in general
> as the PHY library should already take care of software resetting the
> PHY to put it in a sane state, as well as waiting for the appropriate
> delay before using, unlike here, where you do not poll for BMCR_RESET
> to be cleared by the PHY.
>
Do you mean will move BMCR_RESET to common code, that's would be great.
The mdio_reset is added here to get phy_id, otherwise the phy_id can
not be read as well as detection.

Thanks

^ permalink raw reply	[flat|nested] 148+ messages in thread

* [PATCH 2/3] net: hisilicon: new hip04 MDIO driver
@ 2014-03-20 10:53         ` Zhangfei Gao
  0 siblings, 0 replies; 148+ messages in thread
From: Zhangfei Gao @ 2014-03-20 10:53 UTC (permalink / raw)
  To: linux-arm-kernel

Dear Florian

On Wed, Mar 19, 2014 at 1:28 AM, Florian Fainelli <f.fainelli@gmail.com> wrote:
> 2014-03-18 1:40 GMT-07:00 Zhangfei Gao <zhangfei.gao@linaro.org>:

>> +static int hip04_mdio_reset(struct mii_bus *bus)
>> +{
>> +       int temp, err, i;
>> +
>> +       for (i = 0; i < 2; i++) {
>> +               hip04_mdio_write(bus, i, 22, 0);
>> +               temp = hip04_mdio_read(bus, i, MII_BMCR);
>> +               temp |= BMCR_RESET;
>> +               err = hip04_mdio_write(bus, i, MII_BMCR, temp);
>> +               if (err < 0)
>> +                       return err;
>> +       }
>> +
>> +       mdelay(500);
>
> This does not look correct, you should iterate over all possible PHYs:
> PHY_MAX_ADDR instead of hardcoding the loop to 2.

OK, got it.
Use 2 is since only have 2 phy in the board, will use PHY_MAX_ADDR instead.
>
> I think we might want to remove the mdio bus reset callback in general
> as the PHY library should already take care of software resetting the
> PHY to put it in a sane state, as well as waiting for the appropriate
> delay before using, unlike here, where you do not poll for BMCR_RESET
> to be cleared by the PHY.
>
Do you mean will move BMCR_RESET to common code, that's would be great.
The mdio_reset is added here to get phy_id, otherwise the phy_id can
not be read as well as detection.

Thanks

^ permalink raw reply	[flat|nested] 148+ messages in thread

* Re: [PATCH 1/3] Documentation: add Device tree bindings for Hisilicon hip04 ethernet
  2014-03-18 17:39     ` Florian Fainelli
@ 2014-03-20 11:29       ` Zhangfei Gao
  -1 siblings, 0 replies; 148+ messages in thread
From: Zhangfei Gao @ 2014-03-20 11:29 UTC (permalink / raw)
  To: Florian Fainelli
  Cc: Zhangfei Gao, netdev, David S. Miller, linux-arm-kernel, devicetree

Dear Florian

On Wed, Mar 19, 2014 at 1:39 AM, Florian Fainelli <f.fainelli@gmail.com> wrote:
> 2014-03-18 1:40 GMT-07:00 Zhangfei Gao <zhangfei.gao@linaro.org>:

>> +Example:
>> +       mdio {
>> +               compatible = "hisilicon,hip04-mdio";
>> +               reg = <0x28f1000 0x1000>;
>> +               #address-cells = <1>;
>> +               #size-cells = <0>;
>> +
>> +               phy0: ethernet-phy@0 {
>> +                       reg = <0>;
>> +                       marvell,reg-init = <18 0x14 0 0x8001>;
>> +                       device_type = "ethernet-phy";
>
> You are missing a compatible string such as
> "ethernet-phy-ieee802.3-c22", please take a look at
> Documentation/devicetree/bindings/net/phy.txt for an example.
>
> device_type is deprecated and should be removed.

Thanks for the info, will update.

>
>> +               };
>> +
>> +               phy1: ethernet-phy@1 {
>> +                       reg = <1>;
>> +                       marvell,reg-init = <18 0x14 0 0x8001>;
>> +                       device_type = "ethernet-phy";
>> +               };
>> +       };
>> +
>> +       ppebase: ppebase@28c0000 {
>> +               compatible = "hisilicon,hip04-ppebase";
>> +               reg = <0x28c0000 0x10000>;
>
> This should probably look like:
>
>                     #address-cells = <0>;
>                     #size-cells = <0>;
>
>                     eth0_port: port@0 {
>                          reg = <0>;
>                     };
>
>                     eth1_port: port@1f {
>                          reg = <31>;
>                     };
>
> This looks like something similar to mv643xx_eth, you should see
> Documentation/devicetree/bindings/marvell.txt for hints on how to
> model the representation in a similar fashion.

Perfect, this looks more professional, just like the phy description.

The ppe is common device with 2048 channels shared by all the
controllers, only if channels are not overlapped.
Two inputs required,
One is port number, currently use reg=<>,
The other is start channel, I used id before.
Each controller use start channel as RX_DESC_NUM * priv->id.
Do you think still use id from of_alias_get_id(), or add another
property like start_chan etc.
                        eth0_port: port@1f {
                                reg = <31>;
                                start-chan = <0>;
                        };

                        eth1_port: port@0 {
                                reg = <0>;
                                start-chan = <1>;
                        };

>
>> +       };
>> +
>> +       fe: ethernet@28b0000 {
>> +               compatible = "hisilicon,hip04-mac";
>> +               reg = <0x28b0000 0x10000>;
>> +               interrupts = <0 413 4>;
>> +               port = <31>;
>
> I do not think this is the right way to expose that, port should be
> specialized to e.g: hisilicon,port, or you should use a phandle to the
> "ppebase" node which exposes differents ports as subnodes:
>
>                     hisilicon,port-handle = <&eth0_port>;
OK, perfect.

>
>> +               speed = <100>;
>
> max-speed is the standard property for this
The speed can be removed now, and the info can be get from phy-mode = "sgmii"

>
>> +               id = <0>;
>
> id here is a software concept, either you create properly numbered
> aliases for these nodes, and use of_alias_get_id(), or you do not use
> these identifiers at all.

Still not not sure whether use of_alias_get_id() or add property in
the eth_port subnode.
The id is used for specify the start channel.

Thanks

^ permalink raw reply	[flat|nested] 148+ messages in thread

* [PATCH 1/3] Documentation: add Device tree bindings for Hisilicon hip04 ethernet
@ 2014-03-20 11:29       ` Zhangfei Gao
  0 siblings, 0 replies; 148+ messages in thread
From: Zhangfei Gao @ 2014-03-20 11:29 UTC (permalink / raw)
  To: linux-arm-kernel

Dear Florian

On Wed, Mar 19, 2014 at 1:39 AM, Florian Fainelli <f.fainelli@gmail.com> wrote:
> 2014-03-18 1:40 GMT-07:00 Zhangfei Gao <zhangfei.gao@linaro.org>:

>> +Example:
>> +       mdio {
>> +               compatible = "hisilicon,hip04-mdio";
>> +               reg = <0x28f1000 0x1000>;
>> +               #address-cells = <1>;
>> +               #size-cells = <0>;
>> +
>> +               phy0: ethernet-phy at 0 {
>> +                       reg = <0>;
>> +                       marvell,reg-init = <18 0x14 0 0x8001>;
>> +                       device_type = "ethernet-phy";
>
> You are missing a compatible string such as
> "ethernet-phy-ieee802.3-c22", please take a look at
> Documentation/devicetree/bindings/net/phy.txt for an example.
>
> device_type is deprecated and should be removed.

Thanks for the info, will update.

>
>> +               };
>> +
>> +               phy1: ethernet-phy at 1 {
>> +                       reg = <1>;
>> +                       marvell,reg-init = <18 0x14 0 0x8001>;
>> +                       device_type = "ethernet-phy";
>> +               };
>> +       };
>> +
>> +       ppebase: ppebase at 28c0000 {
>> +               compatible = "hisilicon,hip04-ppebase";
>> +               reg = <0x28c0000 0x10000>;
>
> This should probably look like:
>
>                     #address-cells = <0>;
>                     #size-cells = <0>;
>
>                     eth0_port: port at 0 {
>                          reg = <0>;
>                     };
>
>                     eth1_port: port at 1f {
>                          reg = <31>;
>                     };
>
> This looks like something similar to mv643xx_eth, you should see
> Documentation/devicetree/bindings/marvell.txt for hints on how to
> model the representation in a similar fashion.

Perfect, this looks more professional, just like the phy description.

The ppe is common device with 2048 channels shared by all the
controllers, only if channels are not overlapped.
Two inputs required,
One is port number, currently use reg=<>,
The other is start channel, I used id before.
Each controller use start channel as RX_DESC_NUM * priv->id.
Do you think still use id from of_alias_get_id(), or add another
property like start_chan etc.
                        eth0_port: port at 1f {
                                reg = <31>;
                                start-chan = <0>;
                        };

                        eth1_port: port at 0 {
                                reg = <0>;
                                start-chan = <1>;
                        };

>
>> +       };
>> +
>> +       fe: ethernet at 28b0000 {
>> +               compatible = "hisilicon,hip04-mac";
>> +               reg = <0x28b0000 0x10000>;
>> +               interrupts = <0 413 4>;
>> +               port = <31>;
>
> I do not think this is the right way to expose that, port should be
> specialized to e.g: hisilicon,port, or you should use a phandle to the
> "ppebase" node which exposes differents ports as subnodes:
>
>                     hisilicon,port-handle = <&eth0_port>;
OK, perfect.

>
>> +               speed = <100>;
>
> max-speed is the standard property for this
The speed can be removed now, and the info can be get from phy-mode = "sgmii"

>
>> +               id = <0>;
>
> id here is a software concept, either you create properly numbered
> aliases for these nodes, and use of_alias_get_id(), or you do not use
> these identifiers at all.

Still not not sure whether use of_alias_get_id() or add property in
the eth_port subnode.
The id is used for specify the start channel.

Thanks

^ permalink raw reply	[flat|nested] 148+ messages in thread

* Re: [PATCH 3/3] net: hisilicon: new hip04 ethernet driver
  2014-03-18 11:25       ` Arnd Bergmann
@ 2014-03-20 14:00         ` Zhangfei Gao
  -1 siblings, 0 replies; 148+ messages in thread
From: Zhangfei Gao @ 2014-03-20 14:00 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Zhangfei Gao, netdev, David S. Miller, linux-arm-kernel, devicetree

Dear Arnd

On Tue, Mar 18, 2014 at 7:25 PM, Arnd Bergmann <arnd@arndb.de> wrote:
> On Tuesday 18 March 2014 16:40:17 Zhangfei Gao wrote:
>
>> +
>> +static void __iomem *ppebase;
>
> The global 'ppebase' seems hacky. Isn't that a SoC-specific register area, while
> the rest of the driver is reusable across SoCs?
>
> What does 'ppe' stand for?
>
> What if there are multiple instances of this, which each have their own ppebase?

In this specific platform,
ppe is the only module controlling all the fifos for all the net controller.
And each controller connect to specific port.
ppe has 2048 channels, sharing by all the controller, only if not overlapped.

So the static ppebase is required, which I don't like too.
Two inputs required, one is port, which is connect to the controller.
The other is start channel, currently I use id, and start channel is
RX_DESC_NUM * priv->id;  /* start_addr */

>
>> +static void hip04_config_port(struct hip04_priv *priv, u32 speed, u32 duplex)
>> +{
>> +     u32 val;
>> +
>> +     priv->speed = speed;
>> +     priv->duplex = duplex;
>> +
>> +     switch (speed) {
>> +     case SPEED_1000:
>> +             val = 8;
>> +             break;
>> +     case SPEED_100:
>> +             if (priv->id)
>> +                     val = 7;
>> +             else
>> +                     val = 1;
>> +             break;
>> +     default:
>> +             val = 0;
>> +             break;
>> +     }
>> +     writel_relaxed(val, priv->base + GE_PORT_MODE)
>
> This also seems to encode knowledge about a particular implementation
> into the driver. Maybe it's better to add a property for the port
> mode?

After check Documentation/devicetree/bindings/net/ethernet.txt,
I think phy-mode is more suitable here.

        switch (priv->phy_mode) {
        case PHY_INTERFACE_MODE_SGMII:
                if (speed == SPEED_1000)
                        val = 8;
                else
                        val = 7;
                break;
        case PHY_INTERFACE_MODE_MII:
                val = 1;        /* SPEED_100 */
                break;
        default:
                val = 0;
                break;
        }
        writel_relaxed(val, priv->base + GE_PORT_MODE);

probe:
priv->phy_mode = of_get_phy_mode(node);

>
>
>> +static void hip04_set_xmit_desc(struct hip04_priv *priv, dma_addr_t phys)
>> +{
>> +     writel_relaxed(phys, priv->base + PPE_CFG_TX_PKT_BD_ADDR);
>> +}
>> +
>> +static void hip04_set_recv_desc(struct hip04_priv *priv, dma_addr_t phys)
>> +{
>> +     writel_relaxed(phys, ppebase + priv->port * 4 + PPE_CFG_RX_CFF_ADDR);
>> +}
>> +
>> +static u32 hip04_recv_cnt(struct hip04_priv *priv)
>> +{
>> +     return readl_relaxed(priv->base + PPE_HIS_RX_PKT_CNT);
>> +}
>
> At the very least, the hip04_set_xmit_desc() function needs to use 'writel'
> rather than 'writel_relaxed'. Otherwise data that is being sent out
> can be stuck in the CPU's write buffers and you send stale data on the wire.
>
> For the receive path, you may or may not need to use 'readl', depending
> on how DMA is serialized by this device. If you have MSI interrupts, the
> interrupt message should already do the serialization, but if you have
> edge or level triggered interrupts, you normally need to have one readl()
> from the device register between the IRQ and the data access.

Really, will update to readl/wirtel in xmit and hip04_rx_poll.
I just got impression, use *_relaxed as much as possible for better performance.

>
>
>> +static void endian_change(void *p, int size)
>> +{
>> +     unsigned int *to_cover = (unsigned int *)p;
>> +     int i;
>> +
>> +     size = size >> 2;
>> +     for (i = 0; i < size; i++)
>> +             *(to_cover+i) = htonl(*(to_cover+i));
>> +}
>> +
>> +static int hip04_rx_poll(struct napi_struct *napi, int budget)
>> +{
>> +     struct hip04_priv *priv = container_of(napi,
>> +                           struct hip04_priv, napi);
>> +     struct net_device *ndev = priv->ndev;
>> +     struct sk_buff *skb;
>> +     struct rx_desc *desc;
>> +     unsigned char *buf;
>> +     int rx = 0;
>> +     unsigned int cnt = hip04_recv_cnt(priv);
>> +     unsigned int len, tmp[16];
>> +
>> +     while (cnt) {
>> +             buf = priv->rx_buf[priv->rx_head];
>> +             skb = build_skb(buf, priv->rx_buf_size);
>> +             if (unlikely(!skb))
>> +                     net_dbg_ratelimited("build_skb failed\n");
>> +             dma_map_single(&ndev->dev, skb->data,
>> +                     RX_BUF_SIZE, DMA_FROM_DEVICE);
>> +             memcpy(tmp, skb->data, 64);
>> +             endian_change((void *)tmp, 64);
>> +             desc = (struct rx_desc *)tmp;
>> +             len = desc->pkt_len;
>
> The dma_map_single() seems misplaced here, for all I can tell, the
> data has already been transferred. Maybe you mean dma_unmap_single?
>
> I don't see why you copy 64 bytes out of the buffer using endian_change,
> rather than just looking at the first word, which seems to have the
> only value you are interested in.
Russell suggested using be16_to_cpu etc to replace memcpy.

>
>> +             if (len > RX_BUF_SIZE)
>> +                     len = RX_BUF_SIZE;
>> +             if (0 == len)
>> +                     break;
>> +
>> +             skb_reserve(skb, NET_SKB_PAD + NET_IP_ALIGN);
>> +             skb_put(skb, len);
>> +             skb->protocol = eth_type_trans(skb, ndev);
>> +             napi_gro_receive(&priv->napi, skb);
>> +
>> +             buf = netdev_alloc_frag(priv->rx_buf_size);
>> +             if (!buf)
>> +                     return -ENOMEM;
>> +             priv->rx_buf[priv->rx_head] = buf;
>> +             dma_map_single(&ndev->dev, buf, RX_BUF_SIZE, DMA_TO_DEVICE);
>
> Maybe you mean DMA_FROM_DEVICE? The call here doesn't seem to make any
> sense. You also need to use the return value of dma_map_single() every
> time you call it.
>
>> +             hip04_set_recv_desc(priv, virt_to_phys(buf));
>
> and put it right here in the next line. virt_to_phys() is not the correct
> function call, that is what dma_map_single() is meant for.

OK.
>
>> +             priv->rx_head = RX_NEXT(priv->rx_head);
>> +             if (rx++ >= budget)
>> +                     break;
>> +
>> +             if (--cnt == 0)
>> +                     cnt = hip04_recv_cnt(priv);
>> +     }
>> +
>> +     if (rx < budget) {
>> +             napi_gro_flush(napi, false);
>> +             __napi_complete(napi);
>> +     }
>> +
>> +     /* enable rx interrupt */
>> +     priv->reg_inten |= RCV_INT | RCV_NOBUF;
>> +     writel_relaxed(priv->reg_inten, priv->base + PPE_INTEN);
>
>
> Why do you unconditionally turn on interrupts here? Shouldn't you
> only do that after calling napi_complete()?
Yes, right.

>
>
>> +
>> +static int hip04_mac_start_xmit(struct sk_buff *skb, struct net_device *ndev)
>> +{
>> +     struct hip04_priv *priv = netdev_priv(ndev);
>> +     struct tx_desc *desc = priv->td_ring[priv->tx_head];
>> +     unsigned int tx_head = priv->tx_head;
>> +     int ret;
>> +
>> +     hip04_tx_reclaim(ndev, false);
>> +
>> +     spin_lock_irq(&priv->txlock);
>> +     if (priv->tx_count++ >= TX_DESC_NUM) {
>> +             net_dbg_ratelimited("no TX space for packet\n");
>> +             netif_stop_queue(ndev);
>> +             ret = NETDEV_TX_BUSY;
>> +             goto out_unlock;
>> +     }
>> +
>> +     priv->tx_skb[tx_head] = skb;
>> +     dma_map_single(&ndev->dev, skb->data, skb->len, DMA_TO_DEVICE);
>> +     memset((void *)desc, 0, sizeof(*desc));
>> +     desc->send_addr = (unsigned int)virt_to_phys(skb->data);
>
> Just like above: you must not use virt_to_phys here, but rather use
> the output of dma_map_single.
>
> IIRC, you can't generally call dma_map_single() under a spinlock, so
> better move that ahead. It may also be a slow operation.
The spinlock will be removed here, since it is protected by rcu outside.

>
>> +
>> +static int hip04_mac_open(struct net_device *ndev)
>> +{
>> +     struct hip04_priv *priv = netdev_priv(ndev);
>> +     int i;
>> +
>> +     hip04_reset_ppe(priv);
>> +     for (i = 0; i < RX_DESC_NUM; i++) {
>> +             dma_map_single(&ndev->dev, priv->rx_buf[i],
>> +                             RX_BUF_SIZE, DMA_TO_DEVICE);
>> +             hip04_set_recv_desc(priv, virt_to_phys(priv->rx_buf[i]));
>> +     }
>
> And one more. Also DMA_FROM_DEVICE.
>
>> +static int hip04_alloc_ring(struct net_device *ndev, struct device *d)
>> +{
>> +     struct hip04_priv *priv = netdev_priv(ndev);
>> +     int i;
>> +
>> +     priv->rx_buf_size = RX_BUF_SIZE +
>> +                         SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
>> +
>> +     priv->desc_pool = dma_pool_create(DRV_NAME, d, sizeof(struct tx_desc),
>> +                             SKB_DATA_ALIGN(sizeof(struct tx_desc)), 0);
>> +     if (!priv->desc_pool)
>> +             return -ENOMEM;
>> +
>> +     for (i = 0; i < TX_DESC_NUM; i++) {
>> +             priv->td_ring[i] = dma_pool_alloc(priv->desc_pool,
>> +                                     GFP_ATOMIC, &priv->td_phys[i]);
>> +             if (!priv->td_ring[i])
>> +                     return -ENOMEM;
>> +     }
>
> Why do you create a dma pool here, when you do all the allocations upfront?
>
> It looks to me like you could simply turn the td_ring array of pointers
> to tx descriptors into a an array of tx descriptors (no pointers) and allocate
> that one using dma_alloc_coherent.

dma pool used here mainly because of alignment,
the desc has requirement of SKB_DATA_ALIGN,
so use the simplest way
        priv->desc_pool = dma_pool_create(DRV_NAME, d, sizeof(struct tx_desc),
                                SKB_DATA_ALIGN(sizeof(struct tx_desc)), 0);

>
>
>> +     if (!ppebase) {
>> +             struct device_node *n;
>> +
>> +             n = of_find_compatible_node(NULL, NULL, "hisilicon,hip04-ppebase");
>> +             if (!n) {
>> +                     ret = -EINVAL;
>> +                     netdev_err(ndev, "not find hisilicon,ppebase\n");
>> +                     goto init_fail;
>> +             }
>> +             ppebase = of_iomap(n, 0);
>> +     }
>
> How about using syscon_regmap_lookup_by_phandle() here? That way, you can have
> a more generic abstraction of the ppe, and stick the port and id in there as
> well, e.g.
>
>         ppe-syscon = <&hip04ppe 1 4>; // ppe, port, id

To be honest, I am not familiar with syscon_regmap_lookup_by_phandle.

Florian has suggested
              ppe: ppe@28c0000 {
                        compatible = "hisilicon,hip04-ppe";
                        reg = <0x28c0000 0x10000>;
                        #address-cells = <1>;
                        #size-cells = <0>;

                        eth0_port: port@1f {
                                reg = <31>;
                        };

                        eth1_port: port@0 {
                                reg = <0>;
                        };

                        eth2_port: port@8 {
                                reg = <8>;
                        };
                };
                fe: ethernet@28b0000 {
                        compatible = "hisilicon,hip04-mac";
                        reg = <0x28b0000 0x10000>;
                        interrupts = <0 413 4>;
                        phy-mode = "mii";
                        port-handle = <&eth0_port>;
                };
And the port info can be found from port-handle
n = of_parse_phandle(node, "port-handle", 0);
ret = of_property_read_u32(n, "reg", &priv->port);

The id is the controller start channel in ppe, either use alias or
another property in the port.

>
>> +     ret = of_property_read_u32(node, "speed", &val);
>> +     if (ret) {
>> +             dev_warn(d, "not find speed info\n");
>> +             priv->speed = SPEED_1000;
>> +     }
>> +
>> +     if (SPEED_100 == val)
>> +             priv->speed = SPEED_100;
>> +     else
>> +             priv->speed = SPEED_1000;
>> +     priv->duplex = DUPLEX_FULL;
>
> Why do you even need the speed here, shouldn't you get that information
> from the phy through hip04_adjust_link?

There still 100M mac does not have phy, as well adjust_link.
Will use phy-mode instead of speed.

Thanks

^ permalink raw reply	[flat|nested] 148+ messages in thread

* [PATCH 3/3] net: hisilicon: new hip04 ethernet driver
@ 2014-03-20 14:00         ` Zhangfei Gao
  0 siblings, 0 replies; 148+ messages in thread
From: Zhangfei Gao @ 2014-03-20 14:00 UTC (permalink / raw)
  To: linux-arm-kernel

Dear Arnd

On Tue, Mar 18, 2014 at 7:25 PM, Arnd Bergmann <arnd@arndb.de> wrote:
> On Tuesday 18 March 2014 16:40:17 Zhangfei Gao wrote:
>
>> +
>> +static void __iomem *ppebase;
>
> The global 'ppebase' seems hacky. Isn't that a SoC-specific register area, while
> the rest of the driver is reusable across SoCs?
>
> What does 'ppe' stand for?
>
> What if there are multiple instances of this, which each have their own ppebase?

In this specific platform,
ppe is the only module controlling all the fifos for all the net controller.
And each controller connect to specific port.
ppe has 2048 channels, sharing by all the controller, only if not overlapped.

So the static ppebase is required, which I don't like too.
Two inputs required, one is port, which is connect to the controller.
The other is start channel, currently I use id, and start channel is
RX_DESC_NUM * priv->id;  /* start_addr */

>
>> +static void hip04_config_port(struct hip04_priv *priv, u32 speed, u32 duplex)
>> +{
>> +     u32 val;
>> +
>> +     priv->speed = speed;
>> +     priv->duplex = duplex;
>> +
>> +     switch (speed) {
>> +     case SPEED_1000:
>> +             val = 8;
>> +             break;
>> +     case SPEED_100:
>> +             if (priv->id)
>> +                     val = 7;
>> +             else
>> +                     val = 1;
>> +             break;
>> +     default:
>> +             val = 0;
>> +             break;
>> +     }
>> +     writel_relaxed(val, priv->base + GE_PORT_MODE)
>
> This also seems to encode knowledge about a particular implementation
> into the driver. Maybe it's better to add a property for the port
> mode?

After check Documentation/devicetree/bindings/net/ethernet.txt,
I think phy-mode is more suitable here.

        switch (priv->phy_mode) {
        case PHY_INTERFACE_MODE_SGMII:
                if (speed == SPEED_1000)
                        val = 8;
                else
                        val = 7;
                break;
        case PHY_INTERFACE_MODE_MII:
                val = 1;        /* SPEED_100 */
                break;
        default:
                val = 0;
                break;
        }
        writel_relaxed(val, priv->base + GE_PORT_MODE);

probe:
priv->phy_mode = of_get_phy_mode(node);

>
>
>> +static void hip04_set_xmit_desc(struct hip04_priv *priv, dma_addr_t phys)
>> +{
>> +     writel_relaxed(phys, priv->base + PPE_CFG_TX_PKT_BD_ADDR);
>> +}
>> +
>> +static void hip04_set_recv_desc(struct hip04_priv *priv, dma_addr_t phys)
>> +{
>> +     writel_relaxed(phys, ppebase + priv->port * 4 + PPE_CFG_RX_CFF_ADDR);
>> +}
>> +
>> +static u32 hip04_recv_cnt(struct hip04_priv *priv)
>> +{
>> +     return readl_relaxed(priv->base + PPE_HIS_RX_PKT_CNT);
>> +}
>
> At the very least, the hip04_set_xmit_desc() function needs to use 'writel'
> rather than 'writel_relaxed'. Otherwise data that is being sent out
> can be stuck in the CPU's write buffers and you send stale data on the wire.
>
> For the receive path, you may or may not need to use 'readl', depending
> on how DMA is serialized by this device. If you have MSI interrupts, the
> interrupt message should already do the serialization, but if you have
> edge or level triggered interrupts, you normally need to have one readl()
> from the device register between the IRQ and the data access.

Really, will update to readl/wirtel in xmit and hip04_rx_poll.
I just got impression, use *_relaxed as much as possible for better performance.

>
>
>> +static void endian_change(void *p, int size)
>> +{
>> +     unsigned int *to_cover = (unsigned int *)p;
>> +     int i;
>> +
>> +     size = size >> 2;
>> +     for (i = 0; i < size; i++)
>> +             *(to_cover+i) = htonl(*(to_cover+i));
>> +}
>> +
>> +static int hip04_rx_poll(struct napi_struct *napi, int budget)
>> +{
>> +     struct hip04_priv *priv = container_of(napi,
>> +                           struct hip04_priv, napi);
>> +     struct net_device *ndev = priv->ndev;
>> +     struct sk_buff *skb;
>> +     struct rx_desc *desc;
>> +     unsigned char *buf;
>> +     int rx = 0;
>> +     unsigned int cnt = hip04_recv_cnt(priv);
>> +     unsigned int len, tmp[16];
>> +
>> +     while (cnt) {
>> +             buf = priv->rx_buf[priv->rx_head];
>> +             skb = build_skb(buf, priv->rx_buf_size);
>> +             if (unlikely(!skb))
>> +                     net_dbg_ratelimited("build_skb failed\n");
>> +             dma_map_single(&ndev->dev, skb->data,
>> +                     RX_BUF_SIZE, DMA_FROM_DEVICE);
>> +             memcpy(tmp, skb->data, 64);
>> +             endian_change((void *)tmp, 64);
>> +             desc = (struct rx_desc *)tmp;
>> +             len = desc->pkt_len;
>
> The dma_map_single() seems misplaced here, for all I can tell, the
> data has already been transferred. Maybe you mean dma_unmap_single?
>
> I don't see why you copy 64 bytes out of the buffer using endian_change,
> rather than just looking at the first word, which seems to have the
> only value you are interested in.
Russell suggested using be16_to_cpu etc to replace memcpy.

>
>> +             if (len > RX_BUF_SIZE)
>> +                     len = RX_BUF_SIZE;
>> +             if (0 == len)
>> +                     break;
>> +
>> +             skb_reserve(skb, NET_SKB_PAD + NET_IP_ALIGN);
>> +             skb_put(skb, len);
>> +             skb->protocol = eth_type_trans(skb, ndev);
>> +             napi_gro_receive(&priv->napi, skb);
>> +
>> +             buf = netdev_alloc_frag(priv->rx_buf_size);
>> +             if (!buf)
>> +                     return -ENOMEM;
>> +             priv->rx_buf[priv->rx_head] = buf;
>> +             dma_map_single(&ndev->dev, buf, RX_BUF_SIZE, DMA_TO_DEVICE);
>
> Maybe you mean DMA_FROM_DEVICE? The call here doesn't seem to make any
> sense. You also need to use the return value of dma_map_single() every
> time you call it.
>
>> +             hip04_set_recv_desc(priv, virt_to_phys(buf));
>
> and put it right here in the next line. virt_to_phys() is not the correct
> function call, that is what dma_map_single() is meant for.

OK.
>
>> +             priv->rx_head = RX_NEXT(priv->rx_head);
>> +             if (rx++ >= budget)
>> +                     break;
>> +
>> +             if (--cnt == 0)
>> +                     cnt = hip04_recv_cnt(priv);
>> +     }
>> +
>> +     if (rx < budget) {
>> +             napi_gro_flush(napi, false);
>> +             __napi_complete(napi);
>> +     }
>> +
>> +     /* enable rx interrupt */
>> +     priv->reg_inten |= RCV_INT | RCV_NOBUF;
>> +     writel_relaxed(priv->reg_inten, priv->base + PPE_INTEN);
>
>
> Why do you unconditionally turn on interrupts here? Shouldn't you
> only do that after calling napi_complete()?
Yes, right.

>
>
>> +
>> +static int hip04_mac_start_xmit(struct sk_buff *skb, struct net_device *ndev)
>> +{
>> +     struct hip04_priv *priv = netdev_priv(ndev);
>> +     struct tx_desc *desc = priv->td_ring[priv->tx_head];
>> +     unsigned int tx_head = priv->tx_head;
>> +     int ret;
>> +
>> +     hip04_tx_reclaim(ndev, false);
>> +
>> +     spin_lock_irq(&priv->txlock);
>> +     if (priv->tx_count++ >= TX_DESC_NUM) {
>> +             net_dbg_ratelimited("no TX space for packet\n");
>> +             netif_stop_queue(ndev);
>> +             ret = NETDEV_TX_BUSY;
>> +             goto out_unlock;
>> +     }
>> +
>> +     priv->tx_skb[tx_head] = skb;
>> +     dma_map_single(&ndev->dev, skb->data, skb->len, DMA_TO_DEVICE);
>> +     memset((void *)desc, 0, sizeof(*desc));
>> +     desc->send_addr = (unsigned int)virt_to_phys(skb->data);
>
> Just like above: you must not use virt_to_phys here, but rather use
> the output of dma_map_single.
>
> IIRC, you can't generally call dma_map_single() under a spinlock, so
> better move that ahead. It may also be a slow operation.
The spinlock will be removed here, since it is protected by rcu outside.

>
>> +
>> +static int hip04_mac_open(struct net_device *ndev)
>> +{
>> +     struct hip04_priv *priv = netdev_priv(ndev);
>> +     int i;
>> +
>> +     hip04_reset_ppe(priv);
>> +     for (i = 0; i < RX_DESC_NUM; i++) {
>> +             dma_map_single(&ndev->dev, priv->rx_buf[i],
>> +                             RX_BUF_SIZE, DMA_TO_DEVICE);
>> +             hip04_set_recv_desc(priv, virt_to_phys(priv->rx_buf[i]));
>> +     }
>
> And one more. Also DMA_FROM_DEVICE.
>
>> +static int hip04_alloc_ring(struct net_device *ndev, struct device *d)
>> +{
>> +     struct hip04_priv *priv = netdev_priv(ndev);
>> +     int i;
>> +
>> +     priv->rx_buf_size = RX_BUF_SIZE +
>> +                         SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
>> +
>> +     priv->desc_pool = dma_pool_create(DRV_NAME, d, sizeof(struct tx_desc),
>> +                             SKB_DATA_ALIGN(sizeof(struct tx_desc)), 0);
>> +     if (!priv->desc_pool)
>> +             return -ENOMEM;
>> +
>> +     for (i = 0; i < TX_DESC_NUM; i++) {
>> +             priv->td_ring[i] = dma_pool_alloc(priv->desc_pool,
>> +                                     GFP_ATOMIC, &priv->td_phys[i]);
>> +             if (!priv->td_ring[i])
>> +                     return -ENOMEM;
>> +     }
>
> Why do you create a dma pool here, when you do all the allocations upfront?
>
> It looks to me like you could simply turn the td_ring array of pointers
> to tx descriptors into a an array of tx descriptors (no pointers) and allocate
> that one using dma_alloc_coherent.

dma pool used here mainly because of alignment,
the desc has requirement of SKB_DATA_ALIGN,
so use the simplest way
        priv->desc_pool = dma_pool_create(DRV_NAME, d, sizeof(struct tx_desc),
                                SKB_DATA_ALIGN(sizeof(struct tx_desc)), 0);

>
>
>> +     if (!ppebase) {
>> +             struct device_node *n;
>> +
>> +             n = of_find_compatible_node(NULL, NULL, "hisilicon,hip04-ppebase");
>> +             if (!n) {
>> +                     ret = -EINVAL;
>> +                     netdev_err(ndev, "not find hisilicon,ppebase\n");
>> +                     goto init_fail;
>> +             }
>> +             ppebase = of_iomap(n, 0);
>> +     }
>
> How about using syscon_regmap_lookup_by_phandle() here? That way, you can have
> a more generic abstraction of the ppe, and stick the port and id in there as
> well, e.g.
>
>         ppe-syscon = <&hip04ppe 1 4>; // ppe, port, id

To be honest, I am not familiar with syscon_regmap_lookup_by_phandle.

Florian has suggested
              ppe: ppe at 28c0000 {
                        compatible = "hisilicon,hip04-ppe";
                        reg = <0x28c0000 0x10000>;
                        #address-cells = <1>;
                        #size-cells = <0>;

                        eth0_port: port at 1f {
                                reg = <31>;
                        };

                        eth1_port: port at 0 {
                                reg = <0>;
                        };

                        eth2_port: port at 8 {
                                reg = <8>;
                        };
                };
                fe: ethernet at 28b0000 {
                        compatible = "hisilicon,hip04-mac";
                        reg = <0x28b0000 0x10000>;
                        interrupts = <0 413 4>;
                        phy-mode = "mii";
                        port-handle = <&eth0_port>;
                };
And the port info can be found from port-handle
n = of_parse_phandle(node, "port-handle", 0);
ret = of_property_read_u32(n, "reg", &priv->port);

The id is the controller start channel in ppe, either use alias or
another property in the port.

>
>> +     ret = of_property_read_u32(node, "speed", &val);
>> +     if (ret) {
>> +             dev_warn(d, "not find speed info\n");
>> +             priv->speed = SPEED_1000;
>> +     }
>> +
>> +     if (SPEED_100 == val)
>> +             priv->speed = SPEED_100;
>> +     else
>> +             priv->speed = SPEED_1000;
>> +     priv->duplex = DUPLEX_FULL;
>
> Why do you even need the speed here, shouldn't you get that information
> from the phy through hip04_adjust_link?

There still 100M mac does not have phy, as well adjust_link.
Will use phy-mode instead of speed.

Thanks

^ permalink raw reply	[flat|nested] 148+ messages in thread

* Re: [PATCH 3/3] net: hisilicon: new hip04 ethernet driver
  2014-03-20 14:00         ` Zhangfei Gao
@ 2014-03-20 14:31           ` Arnd Bergmann
  -1 siblings, 0 replies; 148+ messages in thread
From: Arnd Bergmann @ 2014-03-20 14:31 UTC (permalink / raw)
  To: Zhangfei Gao
  Cc: Zhangfei Gao, netdev, David S. Miller, linux-arm-kernel, devicetree

On Thursday 20 March 2014, Zhangfei Gao wrote:
> On Tue, Mar 18, 2014 at 7:25 PM, Arnd Bergmann <arnd@arndb.de> wrote:
> > On Tuesday 18 March 2014 16:40:17 Zhangfei Gao wrote:
> >
> >> +
> >> +static void __iomem *ppebase;
> >
> > The global 'ppebase' seems hacky. Isn't that a SoC-specific register area, while
> > the rest of the driver is reusable across SoCs?
> >
> > What does 'ppe' stand for?
> >
> > What if there are multiple instances of this, which each have their own ppebase?
> 
> In this specific platform,
> ppe is the only module controlling all the fifos for all the net controller.
> And each controller connect to specific port.
> ppe has 2048 channels, sharing by all the controller, only if not overlapped.
> 
> So the static ppebase is required, which I don't like too.
> Two inputs required, one is port, which is connect to the controller.
> The other is start channel, currently I use id, and start channel is
> RX_DESC_NUM * priv->id;  /* start_addr */

Ok, thanks for the explanation!

> > This also seems to encode knowledge about a particular implementation
> > into the driver. Maybe it's better to add a property for the port
> > mode?
> 
> After check Documentation/devicetree/bindings/net/ethernet.txt,
> I think phy-mode is more suitable here.
> 
>         switch (priv->phy_mode) {
>         case PHY_INTERFACE_MODE_SGMII:
>                 if (speed == SPEED_1000)
>                         val = 8;
>                 else
>                         val = 7;
>                 break;
>         case PHY_INTERFACE_MODE_MII:
>                 val = 1;        /* SPEED_100 */
>                 break;
>         default:
>                 val = 0;
>                 break;
>         }
>         writel_relaxed(val, priv->base + GE_PORT_MODE);
> 
> probe:
> priv->phy_mode = of_get_phy_mode(node);

Yes, this looks better, but where does 'speed' come from now? I assume
even in SGMII mode, you should allow autonegotiation and set this correctly
from the PHY code. Is that what you do here?

> >> +static void hip04_set_xmit_desc(struct hip04_priv *priv, dma_addr_t phys)
> >> +{
> >> +     writel_relaxed(phys, priv->base + PPE_CFG_TX_PKT_BD_ADDR);
> >> +}
> >> +
> >> +static void hip04_set_recv_desc(struct hip04_priv *priv, dma_addr_t phys)
> >> +{
> >> +     writel_relaxed(phys, ppebase + priv->port * 4 + PPE_CFG_RX_CFF_ADDR);
> >> +}
> >> +
> >> +static u32 hip04_recv_cnt(struct hip04_priv *priv)
> >> +{
> >> +     return readl_relaxed(priv->base + PPE_HIS_RX_PKT_CNT);
> >> +}
> >
> > At the very least, the hip04_set_xmit_desc() function needs to use 'writel'
> > rather than 'writel_relaxed'. Otherwise data that is being sent out
> > can be stuck in the CPU's write buffers and you send stale data on the wire.
> >
> > For the receive path, you may or may not need to use 'readl', depending
> > on how DMA is serialized by this device. If you have MSI interrupts, the
> > interrupt message should already do the serialization, but if you have
> > edge or level triggered interrupts, you normally need to have one readl()
> > from the device register between the IRQ and the data access.
> 
> Really, will update to readl/wirtel in xmit and hip04_rx_poll.
> I just got impression, use *_relaxed as much as possible for better performance.

Ok. The _relaxed() versions are really meant for people that understand
the ordering requirements. The regular readl/writel accessors contain
extra barriers to make them equivalent to what x86 does.

> >> +static int hip04_rx_poll(struct napi_struct *napi, int budget)
> >> +{
> >> +     struct hip04_priv *priv = container_of(napi,
> >> +                           struct hip04_priv, napi);
> >> +     struct net_device *ndev = priv->ndev;
> >> +     struct sk_buff *skb;
> >> +     struct rx_desc *desc;
> >> +     unsigned char *buf;
> >> +     int rx = 0;
> >> +     unsigned int cnt = hip04_recv_cnt(priv);
> >> +     unsigned int len, tmp[16];
> >> +
> >> +     while (cnt) {
> >> +             buf = priv->rx_buf[priv->rx_head];
> >> +             skb = build_skb(buf, priv->rx_buf_size);
> >> +             if (unlikely(!skb))
> >> +                     net_dbg_ratelimited("build_skb failed\n");
> >> +             dma_map_single(&ndev->dev, skb->data,
> >> +                     RX_BUF_SIZE, DMA_FROM_DEVICE);
> >> +             memcpy(tmp, skb->data, 64);
> >> +             endian_change((void *)tmp, 64);
> >> +             desc = (struct rx_desc *)tmp;
> >> +             len = desc->pkt_len;
> >
> > The dma_map_single() seems misplaced here, for all I can tell, the
> > data has already been transferred. Maybe you mean dma_unmap_single?
> >
> > I don't see why you copy 64 bytes out of the buffer using endian_change,
> > rather than just looking at the first word, which seems to have the
> > only value you are interested in.
> Russell suggested using be16_to_cpu etc to replace memcpy.

Right, but doesn't it have to be be32_to_cpu?

> >> +static int hip04_alloc_ring(struct net_device *ndev, struct device *d)
> >> +{
> >> +     struct hip04_priv *priv = netdev_priv(ndev);
> >> +     int i;
> >> +
> >> +     priv->rx_buf_size = RX_BUF_SIZE +
> >> +                         SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
> >> +
> >> +     priv->desc_pool = dma_pool_create(DRV_NAME, d, sizeof(struct tx_desc),
> >> +                             SKB_DATA_ALIGN(sizeof(struct tx_desc)), 0);
> >> +     if (!priv->desc_pool)
> >> +             return -ENOMEM;
> >> +
> >> +     for (i = 0; i < TX_DESC_NUM; i++) {
> >> +             priv->td_ring[i] = dma_pool_alloc(priv->desc_pool,
> >> +                                     GFP_ATOMIC, &priv->td_phys[i]);
> >> +             if (!priv->td_ring[i])
> >> +                     return -ENOMEM;
> >> +     }
> >
> > Why do you create a dma pool here, when you do all the allocations upfront?
> >
> > It looks to me like you could simply turn the td_ring array of pointers
> > to tx descriptors into a an array of tx descriptors (no pointers) and allocate
> > that one using dma_alloc_coherent.
> 
> dma pool used here mainly because of alignment,
> the desc has requirement of SKB_DATA_ALIGN,
> so use the simplest way
>         priv->desc_pool = dma_pool_create(DRV_NAME, d, sizeof(struct tx_desc),
>                                 SKB_DATA_ALIGN(sizeof(struct tx_desc)), 0);

dma_alloc_coherent() will actually give you PAGE_SIZE alignment, so that's
still easier.

> >
> >
> >> +     if (!ppebase) {
> >> +             struct device_node *n;
> >> +
> >> +             n = of_find_compatible_node(NULL, NULL, "hisilicon,hip04-ppebase");
> >> +             if (!n) {
> >> +                     ret = -EINVAL;
> >> +                     netdev_err(ndev, "not find hisilicon,ppebase\n");
> >> +                     goto init_fail;
> >> +             }
> >> +             ppebase = of_iomap(n, 0);
> >> +     }
> >
> > How about using syscon_regmap_lookup_by_phandle() here? That way, you can have
> > a more generic abstraction of the ppe, and stick the port and id in there as
> > well, e.g.
> >
> >         ppe-syscon = <&hip04ppe 1 4>; // ppe, port, id
> 
> To be honest, I am not familiar with syscon_regmap_lookup_by_phandle.
> 
> Florian has suggested
>               ppe: ppe@28c0000 {
>                         compatible = "hisilicon,hip04-ppe";
>                         reg = <0x28c0000 0x10000>;
>                         #address-cells = <1>;
>                         #size-cells = <0>;
> 
>                         eth0_port: port@1f {
>                                 reg = <31>;

		minor comment: I'd use 0x1f for the reg too.

>                         };
> 
>                         eth1_port: port@0 {
>                                 reg = <0>;
>                         };
> 
>                         eth2_port: port@8 {
>                                 reg = <8>;
>                         };
>                 };
>                 fe: ethernet@28b0000 {
>                         compatible = "hisilicon,hip04-mac";
>                         reg = <0x28b0000 0x10000>;
>                         interrupts = <0 413 4>;
>                         phy-mode = "mii";
>                         port-handle = <&eth0_port>;
>                 };
> And the port info can be found from port-handle
> n = of_parse_phandle(node, "port-handle", 0);
> ret = of_property_read_u32(n, "reg", &priv->port);
> 
> The id is the controller start channel in ppe, either use alias or
> another property in the port.

Yes, this seems fine as well, as long as you are sure that the ppe
device is only used by hip04 ethernet devices, I think it
would get messy if your hardware shares them with other units.

It's probably a little simpler to avoid the sub-nodes and instead do

>               ppe: ppe@28c0000 {
>                         compatible = "hisilicon,hip04-ppe";
>                         reg = <0x28c0000 0x10000>;
>                         #address-cells = <1>;
>                         #size-cells = <0>;
>                 };
>                 fe: ethernet@28b0000 {
>                         compatible = "hisilicon,hip04-mac";
>                         reg = <0x28b0000 0x10000>;
>                         interrupts = <0 413 4>;
>                         phy-mode = "mii";
>                         port-handle = <&ppe 31>;
>                 };

In the code, I would create a simple ppe driver that exports
a few functions. you need. In the ethernet driver probe() function,
you should get a handle to the ppe using

	/* look up "port-handle" property of the current device, find ppe and port */
	struct hip04_ppe *ppe = hip04_ppe_get(dev->of_node); 
	if (IS_ERR(ppe))
		return PTR_ERR(ptr); /* this handles -EPROBE_DEFER */

and then in other code you can just do

	hip04_ppe_set_foo(priv->ppe, foo_config);

This is a somewhat more high-level abstraction that syscon, which
just gives you a 'struct regmap' structure for doing register-level
configuration like you have today.

> >> +     ret = of_property_read_u32(node, "speed", &val);
> >> +     if (ret) {
> >> +             dev_warn(d, "not find speed info\n");
> >> +             priv->speed = SPEED_1000;
> >> +     }
> >> +
> >> +     if (SPEED_100 == val)
> >> +             priv->speed = SPEED_100;
> >> +     else
> >> +             priv->speed = SPEED_1000;
> >> +     priv->duplex = DUPLEX_FULL;
> >
> > Why do you even need the speed here, shouldn't you get that information
> > from the phy through hip04_adjust_link?
> 
> There still 100M mac does not have phy, as well adjust_link.
> Will use phy-mode instead of speed.

Ok.

	Arnd

^ permalink raw reply	[flat|nested] 148+ messages in thread

* [PATCH 3/3] net: hisilicon: new hip04 ethernet driver
@ 2014-03-20 14:31           ` Arnd Bergmann
  0 siblings, 0 replies; 148+ messages in thread
From: Arnd Bergmann @ 2014-03-20 14:31 UTC (permalink / raw)
  To: linux-arm-kernel

On Thursday 20 March 2014, Zhangfei Gao wrote:
> On Tue, Mar 18, 2014 at 7:25 PM, Arnd Bergmann <arnd@arndb.de> wrote:
> > On Tuesday 18 March 2014 16:40:17 Zhangfei Gao wrote:
> >
> >> +
> >> +static void __iomem *ppebase;
> >
> > The global 'ppebase' seems hacky. Isn't that a SoC-specific register area, while
> > the rest of the driver is reusable across SoCs?
> >
> > What does 'ppe' stand for?
> >
> > What if there are multiple instances of this, which each have their own ppebase?
> 
> In this specific platform,
> ppe is the only module controlling all the fifos for all the net controller.
> And each controller connect to specific port.
> ppe has 2048 channels, sharing by all the controller, only if not overlapped.
> 
> So the static ppebase is required, which I don't like too.
> Two inputs required, one is port, which is connect to the controller.
> The other is start channel, currently I use id, and start channel is
> RX_DESC_NUM * priv->id;  /* start_addr */

Ok, thanks for the explanation!

> > This also seems to encode knowledge about a particular implementation
> > into the driver. Maybe it's better to add a property for the port
> > mode?
> 
> After check Documentation/devicetree/bindings/net/ethernet.txt,
> I think phy-mode is more suitable here.
> 
>         switch (priv->phy_mode) {
>         case PHY_INTERFACE_MODE_SGMII:
>                 if (speed == SPEED_1000)
>                         val = 8;
>                 else
>                         val = 7;
>                 break;
>         case PHY_INTERFACE_MODE_MII:
>                 val = 1;        /* SPEED_100 */
>                 break;
>         default:
>                 val = 0;
>                 break;
>         }
>         writel_relaxed(val, priv->base + GE_PORT_MODE);
> 
> probe:
> priv->phy_mode = of_get_phy_mode(node);

Yes, this looks better, but where does 'speed' come from now? I assume
even in SGMII mode, you should allow autonegotiation and set this correctly
from the PHY code. Is that what you do here?

> >> +static void hip04_set_xmit_desc(struct hip04_priv *priv, dma_addr_t phys)
> >> +{
> >> +     writel_relaxed(phys, priv->base + PPE_CFG_TX_PKT_BD_ADDR);
> >> +}
> >> +
> >> +static void hip04_set_recv_desc(struct hip04_priv *priv, dma_addr_t phys)
> >> +{
> >> +     writel_relaxed(phys, ppebase + priv->port * 4 + PPE_CFG_RX_CFF_ADDR);
> >> +}
> >> +
> >> +static u32 hip04_recv_cnt(struct hip04_priv *priv)
> >> +{
> >> +     return readl_relaxed(priv->base + PPE_HIS_RX_PKT_CNT);
> >> +}
> >
> > At the very least, the hip04_set_xmit_desc() function needs to use 'writel'
> > rather than 'writel_relaxed'. Otherwise data that is being sent out
> > can be stuck in the CPU's write buffers and you send stale data on the wire.
> >
> > For the receive path, you may or may not need to use 'readl', depending
> > on how DMA is serialized by this device. If you have MSI interrupts, the
> > interrupt message should already do the serialization, but if you have
> > edge or level triggered interrupts, you normally need to have one readl()
> > from the device register between the IRQ and the data access.
> 
> Really, will update to readl/wirtel in xmit and hip04_rx_poll.
> I just got impression, use *_relaxed as much as possible for better performance.

Ok. The _relaxed() versions are really meant for people that understand
the ordering requirements. The regular readl/writel accessors contain
extra barriers to make them equivalent to what x86 does.

> >> +static int hip04_rx_poll(struct napi_struct *napi, int budget)
> >> +{
> >> +     struct hip04_priv *priv = container_of(napi,
> >> +                           struct hip04_priv, napi);
> >> +     struct net_device *ndev = priv->ndev;
> >> +     struct sk_buff *skb;
> >> +     struct rx_desc *desc;
> >> +     unsigned char *buf;
> >> +     int rx = 0;
> >> +     unsigned int cnt = hip04_recv_cnt(priv);
> >> +     unsigned int len, tmp[16];
> >> +
> >> +     while (cnt) {
> >> +             buf = priv->rx_buf[priv->rx_head];
> >> +             skb = build_skb(buf, priv->rx_buf_size);
> >> +             if (unlikely(!skb))
> >> +                     net_dbg_ratelimited("build_skb failed\n");
> >> +             dma_map_single(&ndev->dev, skb->data,
> >> +                     RX_BUF_SIZE, DMA_FROM_DEVICE);
> >> +             memcpy(tmp, skb->data, 64);
> >> +             endian_change((void *)tmp, 64);
> >> +             desc = (struct rx_desc *)tmp;
> >> +             len = desc->pkt_len;
> >
> > The dma_map_single() seems misplaced here, for all I can tell, the
> > data has already been transferred. Maybe you mean dma_unmap_single?
> >
> > I don't see why you copy 64 bytes out of the buffer using endian_change,
> > rather than just looking at the first word, which seems to have the
> > only value you are interested in.
> Russell suggested using be16_to_cpu etc to replace memcpy.

Right, but doesn't it have to be be32_to_cpu?

> >> +static int hip04_alloc_ring(struct net_device *ndev, struct device *d)
> >> +{
> >> +     struct hip04_priv *priv = netdev_priv(ndev);
> >> +     int i;
> >> +
> >> +     priv->rx_buf_size = RX_BUF_SIZE +
> >> +                         SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
> >> +
> >> +     priv->desc_pool = dma_pool_create(DRV_NAME, d, sizeof(struct tx_desc),
> >> +                             SKB_DATA_ALIGN(sizeof(struct tx_desc)), 0);
> >> +     if (!priv->desc_pool)
> >> +             return -ENOMEM;
> >> +
> >> +     for (i = 0; i < TX_DESC_NUM; i++) {
> >> +             priv->td_ring[i] = dma_pool_alloc(priv->desc_pool,
> >> +                                     GFP_ATOMIC, &priv->td_phys[i]);
> >> +             if (!priv->td_ring[i])
> >> +                     return -ENOMEM;
> >> +     }
> >
> > Why do you create a dma pool here, when you do all the allocations upfront?
> >
> > It looks to me like you could simply turn the td_ring array of pointers
> > to tx descriptors into a an array of tx descriptors (no pointers) and allocate
> > that one using dma_alloc_coherent.
> 
> dma pool used here mainly because of alignment,
> the desc has requirement of SKB_DATA_ALIGN,
> so use the simplest way
>         priv->desc_pool = dma_pool_create(DRV_NAME, d, sizeof(struct tx_desc),
>                                 SKB_DATA_ALIGN(sizeof(struct tx_desc)), 0);

dma_alloc_coherent() will actually give you PAGE_SIZE alignment, so that's
still easier.

> >
> >
> >> +     if (!ppebase) {
> >> +             struct device_node *n;
> >> +
> >> +             n = of_find_compatible_node(NULL, NULL, "hisilicon,hip04-ppebase");
> >> +             if (!n) {
> >> +                     ret = -EINVAL;
> >> +                     netdev_err(ndev, "not find hisilicon,ppebase\n");
> >> +                     goto init_fail;
> >> +             }
> >> +             ppebase = of_iomap(n, 0);
> >> +     }
> >
> > How about using syscon_regmap_lookup_by_phandle() here? That way, you can have
> > a more generic abstraction of the ppe, and stick the port and id in there as
> > well, e.g.
> >
> >         ppe-syscon = <&hip04ppe 1 4>; // ppe, port, id
> 
> To be honest, I am not familiar with syscon_regmap_lookup_by_phandle.
> 
> Florian has suggested
>               ppe: ppe at 28c0000 {
>                         compatible = "hisilicon,hip04-ppe";
>                         reg = <0x28c0000 0x10000>;
>                         #address-cells = <1>;
>                         #size-cells = <0>;
> 
>                         eth0_port: port at 1f {
>                                 reg = <31>;

		minor comment: I'd use 0x1f for the reg too.

>                         };
> 
>                         eth1_port: port at 0 {
>                                 reg = <0>;
>                         };
> 
>                         eth2_port: port at 8 {
>                                 reg = <8>;
>                         };
>                 };
>                 fe: ethernet at 28b0000 {
>                         compatible = "hisilicon,hip04-mac";
>                         reg = <0x28b0000 0x10000>;
>                         interrupts = <0 413 4>;
>                         phy-mode = "mii";
>                         port-handle = <&eth0_port>;
>                 };
> And the port info can be found from port-handle
> n = of_parse_phandle(node, "port-handle", 0);
> ret = of_property_read_u32(n, "reg", &priv->port);
> 
> The id is the controller start channel in ppe, either use alias or
> another property in the port.

Yes, this seems fine as well, as long as you are sure that the ppe
device is only used by hip04 ethernet devices, I think it
would get messy if your hardware shares them with other units.

It's probably a little simpler to avoid the sub-nodes and instead do

>               ppe: ppe at 28c0000 {
>                         compatible = "hisilicon,hip04-ppe";
>                         reg = <0x28c0000 0x10000>;
>                         #address-cells = <1>;
>                         #size-cells = <0>;
>                 };
>                 fe: ethernet at 28b0000 {
>                         compatible = "hisilicon,hip04-mac";
>                         reg = <0x28b0000 0x10000>;
>                         interrupts = <0 413 4>;
>                         phy-mode = "mii";
>                         port-handle = <&ppe 31>;
>                 };

In the code, I would create a simple ppe driver that exports
a few functions. you need. In the ethernet driver probe() function,
you should get a handle to the ppe using

	/* look up "port-handle" property of the current device, find ppe and port */
	struct hip04_ppe *ppe = hip04_ppe_get(dev->of_node); 
	if (IS_ERR(ppe))
		return PTR_ERR(ptr); /* this handles -EPROBE_DEFER */

and then in other code you can just do

	hip04_ppe_set_foo(priv->ppe, foo_config);

This is a somewhat more high-level abstraction that syscon, which
just gives you a 'struct regmap' structure for doing register-level
configuration like you have today.

> >> +     ret = of_property_read_u32(node, "speed", &val);
> >> +     if (ret) {
> >> +             dev_warn(d, "not find speed info\n");
> >> +             priv->speed = SPEED_1000;
> >> +     }
> >> +
> >> +     if (SPEED_100 == val)
> >> +             priv->speed = SPEED_100;
> >> +     else
> >> +             priv->speed = SPEED_1000;
> >> +     priv->duplex = DUPLEX_FULL;
> >
> > Why do you even need the speed here, shouldn't you get that information
> > from the phy through hip04_adjust_link?
> 
> There still 100M mac does not have phy, as well adjust_link.
> Will use phy-mode instead of speed.

Ok.

	Arnd

^ permalink raw reply	[flat|nested] 148+ messages in thread

* Re: [PATCH 2/3] net: hisilicon: new hip04 MDIO driver
  2014-03-20 10:53         ` Zhangfei Gao
@ 2014-03-20 17:59           ` Florian Fainelli
  -1 siblings, 0 replies; 148+ messages in thread
From: Florian Fainelli @ 2014-03-20 17:59 UTC (permalink / raw)
  To: Zhangfei Gao
  Cc: Zhangfei Gao, netdev, David S. Miller, linux-arm-kernel, devicetree

2014-03-20 3:53 GMT-07:00 Zhangfei Gao <zhangfei.gao@gmail.com>:
> Dear Florian
>
> On Wed, Mar 19, 2014 at 1:28 AM, Florian Fainelli <f.fainelli@gmail.com> wrote:
>> 2014-03-18 1:40 GMT-07:00 Zhangfei Gao <zhangfei.gao@linaro.org>:
>
>>> +static int hip04_mdio_reset(struct mii_bus *bus)
>>> +{
>>> +       int temp, err, i;
>>> +
>>> +       for (i = 0; i < 2; i++) {
>>> +               hip04_mdio_write(bus, i, 22, 0);
>>> +               temp = hip04_mdio_read(bus, i, MII_BMCR);
>>> +               temp |= BMCR_RESET;
>>> +               err = hip04_mdio_write(bus, i, MII_BMCR, temp);
>>> +               if (err < 0)
>>> +                       return err;
>>> +       }
>>> +
>>> +       mdelay(500);
>>
>> This does not look correct, you should iterate over all possible PHYs:
>> PHY_MAX_ADDR instead of hardcoding the loop to 2.
>
> OK, got it.
> Use 2 is since only have 2 phy in the board, will use PHY_MAX_ADDR instead.
>>
>> I think we might want to remove the mdio bus reset callback in general
>> as the PHY library should already take care of software resetting the
>> PHY to put it in a sane state, as well as waiting for the appropriate
>> delay before using, unlike here, where you do not poll for BMCR_RESET
>> to be cleared by the PHY.
>>
> Do you mean will move BMCR_RESET to common code, that's would be great.
> The mdio_reset is added here to get phy_id, otherwise the phy_id can
> not be read as well as detection.

Oh I see, thanks for mentioning that, this is not properly covered
here today as we need to get the PHY id before we assign it a
phy_device structure, let me cook a patch for this.
-- 
Florian

^ permalink raw reply	[flat|nested] 148+ messages in thread

* [PATCH 2/3] net: hisilicon: new hip04 MDIO driver
@ 2014-03-20 17:59           ` Florian Fainelli
  0 siblings, 0 replies; 148+ messages in thread
From: Florian Fainelli @ 2014-03-20 17:59 UTC (permalink / raw)
  To: linux-arm-kernel

2014-03-20 3:53 GMT-07:00 Zhangfei Gao <zhangfei.gao@gmail.com>:
> Dear Florian
>
> On Wed, Mar 19, 2014 at 1:28 AM, Florian Fainelli <f.fainelli@gmail.com> wrote:
>> 2014-03-18 1:40 GMT-07:00 Zhangfei Gao <zhangfei.gao@linaro.org>:
>
>>> +static int hip04_mdio_reset(struct mii_bus *bus)
>>> +{
>>> +       int temp, err, i;
>>> +
>>> +       for (i = 0; i < 2; i++) {
>>> +               hip04_mdio_write(bus, i, 22, 0);
>>> +               temp = hip04_mdio_read(bus, i, MII_BMCR);
>>> +               temp |= BMCR_RESET;
>>> +               err = hip04_mdio_write(bus, i, MII_BMCR, temp);
>>> +               if (err < 0)
>>> +                       return err;
>>> +       }
>>> +
>>> +       mdelay(500);
>>
>> This does not look correct, you should iterate over all possible PHYs:
>> PHY_MAX_ADDR instead of hardcoding the loop to 2.
>
> OK, got it.
> Use 2 is since only have 2 phy in the board, will use PHY_MAX_ADDR instead.
>>
>> I think we might want to remove the mdio bus reset callback in general
>> as the PHY library should already take care of software resetting the
>> PHY to put it in a sane state, as well as waiting for the appropriate
>> delay before using, unlike here, where you do not poll for BMCR_RESET
>> to be cleared by the PHY.
>>
> Do you mean will move BMCR_RESET to common code, that's would be great.
> The mdio_reset is added here to get phy_id, otherwise the phy_id can
> not be read as well as detection.

Oh I see, thanks for mentioning that, this is not properly covered
here today as we need to get the PHY id before we assign it a
phy_device structure, let me cook a patch for this.
-- 
Florian

^ permalink raw reply	[flat|nested] 148+ messages in thread

* Re: [PATCH 3/3] net: hisilicon: new hip04 ethernet driver
  2014-03-20 14:31           ` Arnd Bergmann
@ 2014-03-21  5:19               ` Zhangfei Gao
  -1 siblings, 0 replies; 148+ messages in thread
From: Zhangfei Gao @ 2014-03-21  5:19 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Zhangfei Gao, netdev, David S. Miller, linux-arm-kernel,
	devicetree-u79uwXL29TY76Z2rM5mHXA

Dear Arnd

On Thu, Mar 20, 2014 at 10:31 PM, Arnd Bergmann <arnd-r2nGTMty4D4@public.gmane.org> wrote:
>> > This also seems to encode knowledge about a particular implementation
>> > into the driver. Maybe it's better to add a property for the port
>> > mode?
>>
>> After check Documentation/devicetree/bindings/net/ethernet.txt,
>> I think phy-mode is more suitable here.
>>
>>         switch (priv->phy_mode) {
>>         case PHY_INTERFACE_MODE_SGMII:
>>                 if (speed == SPEED_1000)
>>                         val = 8;
>>                 else
>>                         val = 7;
>>                 break;
>>         case PHY_INTERFACE_MODE_MII:
>>                 val = 1;        /* SPEED_100 */
>>                 break;
>>         default:
>>                 val = 0;
>>                 break;
>>         }
>>         writel_relaxed(val, priv->base + GE_PORT_MODE);
>>
>> probe:
>> priv->phy_mode = of_get_phy_mode(node);
>
> Yes, this looks better, but where does 'speed' come from now? I assume
> even in SGMII mode, you should allow autonegotiation and set this correctly
> from the PHY code. Is that what you do here?

Yes, for the SGMII, there is the auto negotiation.
'speed' coming from adjust_link.
While the MII mode, I will directly set 100M at probe, since no auto
negotiation.

>
>> >> +static void hip04_set_xmit_desc(struct hip04_priv *priv, dma_addr_t phys)
>> >> +{
>> >> +     writel_relaxed(phys, priv->base + PPE_CFG_TX_PKT_BD_ADDR);
>> >> +}
>> >> +
>> >> +static void hip04_set_recv_desc(struct hip04_priv *priv, dma_addr_t phys)
>> >> +{
>> >> +     writel_relaxed(phys, ppebase + priv->port * 4 + PPE_CFG_RX_CFF_ADDR);
>> >> +}
>> >> +
>> >> +static u32 hip04_recv_cnt(struct hip04_priv *priv)
>> >> +{
>> >> +     return readl_relaxed(priv->base + PPE_HIS_RX_PKT_CNT);
>> >> +}
>> >
>> > At the very least, the hip04_set_xmit_desc() function needs to use 'writel'
>> > rather than 'writel_relaxed'. Otherwise data that is being sent out
>> > can be stuck in the CPU's write buffers and you send stale data on the wire.
>> >
>> > For the receive path, you may or may not need to use 'readl', depending
>> > on how DMA is serialized by this device. If you have MSI interrupts, the
>> > interrupt message should already do the serialization, but if you have
>> > edge or level triggered interrupts, you normally need to have one readl()
>> > from the device register between the IRQ and the data access.
>>
>> Really, will update to readl/wirtel in xmit and hip04_rx_poll.
>> I just got impression, use *_relaxed as much as possible for better performance.
>
> Ok. The _relaxed() versions are really meant for people that understand
> the ordering requirements. The regular readl/writel accessors contain
> extra barriers to make them equivalent to what x86 does.

Thanks for the knowledge.
So generally readl/writel should be put in critical process, where we
want the immediate result,
like irq handler?

>
>> >> +static int hip04_rx_poll(struct napi_struct *napi, int budget)
>> >> +{
>> >> +     struct hip04_priv *priv = container_of(napi,
>> >> +                           struct hip04_priv, napi);
>> >> +     struct net_device *ndev = priv->ndev;
>> >> +     struct sk_buff *skb;
>> >> +     struct rx_desc *desc;
>> >> +     unsigned char *buf;
>> >> +     int rx = 0;
>> >> +     unsigned int cnt = hip04_recv_cnt(priv);
>> >> +     unsigned int len, tmp[16];
>> >> +
>> >> +     while (cnt) {
>> >> +             buf = priv->rx_buf[priv->rx_head];
>> >> +             skb = build_skb(buf, priv->rx_buf_size);
>> >> +             if (unlikely(!skb))
>> >> +                     net_dbg_ratelimited("build_skb failed\n");
>> >> +             dma_map_single(&ndev->dev, skb->data,
>> >> +                     RX_BUF_SIZE, DMA_FROM_DEVICE);
>> >> +             memcpy(tmp, skb->data, 64);
>> >> +             endian_change((void *)tmp, 64);
>> >> +             desc = (struct rx_desc *)tmp;
>> >> +             len = desc->pkt_len;
>> >
>> > The dma_map_single() seems misplaced here, for all I can tell, the
>> > data has already been transferred. Maybe you mean dma_unmap_single?
>> >
>> > I don't see why you copy 64 bytes out of the buffer using endian_change,
>> > rather than just looking at the first word, which seems to have the
>> > only value you are interested in.
>> Russell suggested using be16_to_cpu etc to replace memcpy.
>
> Right, but doesn't it have to be be32_to_cpu?

The reason I got frustrated before is there are u16 and u32 in the structure.
While Russell reminded changing the structure layout.
So switching u16 & be16_to_cpu, and using be32_to_cpu for u32 works out.

>
>> >> +static int hip04_alloc_ring(struct net_device *ndev, struct device *d)
>> >> +{
>> >> +     struct hip04_priv *priv = netdev_priv(ndev);
>> >> +     int i;
>> >> +
>> >> +     priv->rx_buf_size = RX_BUF_SIZE +
>> >> +                         SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
>> >> +
>> >> +     priv->desc_pool = dma_pool_create(DRV_NAME, d, sizeof(struct tx_desc),
>> >> +                             SKB_DATA_ALIGN(sizeof(struct tx_desc)), 0);
>> >> +     if (!priv->desc_pool)
>> >> +             return -ENOMEM;
>> >> +
>> >> +     for (i = 0; i < TX_DESC_NUM; i++) {
>> >> +             priv->td_ring[i] = dma_pool_alloc(priv->desc_pool,
>> >> +                                     GFP_ATOMIC, &priv->td_phys[i]);
>> >> +             if (!priv->td_ring[i])
>> >> +                     return -ENOMEM;
>> >> +     }
>> >
>> > Why do you create a dma pool here, when you do all the allocations upfront?
>> >
>> > It looks to me like you could simply turn the td_ring array of pointers
>> > to tx descriptors into a an array of tx descriptors (no pointers) and allocate
>> > that one using dma_alloc_coherent.
>>
>> dma pool used here mainly because of alignment,
>> the desc has requirement of SKB_DATA_ALIGN,
>> so use the simplest way
>>         priv->desc_pool = dma_pool_create(DRV_NAME, d, sizeof(struct tx_desc),
>>                                 SKB_DATA_ALIGN(sizeof(struct tx_desc)), 0);
>
> dma_alloc_coherent() will actually give you PAGE_SIZE alignment, so that's
> still easier.
However since the alignment requirement, it can not simply use desc[i]
to get each desc.
desc = dma_alloc_coherent(d, size, &phys, GFP_KERNEL);
desc[i] is not what we want.
So still prefer using dma_pool_alloc here.
>
>> >
>> >
>> >> +     if (!ppebase) {
>> >> +             struct device_node *n;
>> >> +
>> >> +             n = of_find_compatible_node(NULL, NULL, "hisilicon,hip04-ppebase");
>> >> +             if (!n) {
>> >> +                     ret = -EINVAL;
>> >> +                     netdev_err(ndev, "not find hisilicon,ppebase\n");
>> >> +                     goto init_fail;
>> >> +             }
>> >> +             ppebase = of_iomap(n, 0);
>> >> +     }
>> >
>> > How about using syscon_regmap_lookup_by_phandle() here? That way, you can have
>> > a more generic abstraction of the ppe, and stick the port and id in there as
>> > well, e.g.
>> >
>> >         ppe-syscon = <&hip04ppe 1 4>; // ppe, port, id
>>
>> To be honest, I am not familiar with syscon_regmap_lookup_by_phandle.
>>
>> Florian has suggested
>>               ppe: ppe@28c0000 {
>>                         compatible = "hisilicon,hip04-ppe";
>>                         reg = <0x28c0000 0x10000>;
>>                         #address-cells = <1>;
>>                         #size-cells = <0>;
>>
>>                         eth0_port: port@1f {
>>                                 reg = <31>;
>
>                 minor comment: I'd use 0x1f for the reg too.
>
>>                         };
>>
>>                         eth1_port: port@0 {
>>                                 reg = <0>;
>>                         };
>>
>>                         eth2_port: port@8 {
>>                                 reg = <8>;
>>                         };
>>                 };
>>                 fe: ethernet@28b0000 {
>>                         compatible = "hisilicon,hip04-mac";
>>                         reg = <0x28b0000 0x10000>;
>>                         interrupts = <0 413 4>;
>>                         phy-mode = "mii";
>>                         port-handle = <&eth0_port>;
>>                 };
>> And the port info can be found from port-handle
>> n = of_parse_phandle(node, "port-handle", 0);
>> ret = of_property_read_u32(n, "reg", &priv->port);
>>
>> The id is the controller start channel in ppe, either use alias or
>> another property in the port.
>
> Yes, this seems fine as well, as long as you are sure that the ppe
> device is only used by hip04 ethernet devices, I think it
> would get messy if your hardware shares them with other units.

Checked syscon_regmap, looks it specially useful for the accessing
common registers.
Here ppe is ethernet "accelerator", used specifically in this driver.
Since here only have several places access the ppebase, it maybe
simpler access directly.

>
> It's probably a little simpler to avoid the sub-nodes and instead do
>
>>               ppe: ppe@28c0000 {
>>                         compatible = "hisilicon,hip04-ppe";
>>                         reg = <0x28c0000 0x10000>;
>>                         #address-cells = <1>;
>>                         #size-cells = <0>;
>>                 };
>>                 fe: ethernet@28b0000 {
>>                         compatible = "hisilicon,hip04-mac";
>>                         reg = <0x28b0000 0x10000>;
>>                         interrupts = <0 413 4>;
>>                         phy-mode = "mii";
>>                         port-handle = <&ppe 31>;
>>                 };
>
> In the code, I would create a simple ppe driver that exports
> a few functions. you need. In the ethernet driver probe() function,
> you should get a handle to the ppe using
>
>         /* look up "port-handle" property of the current device, find ppe and port */
>         struct hip04_ppe *ppe = hip04_ppe_get(dev->of_node);
>         if (IS_ERR(ppe))
>                 return PTR_ERR(ptr); /* this handles -EPROBE_DEFER */
>
> and then in other code you can just do
>
>         hip04_ppe_set_foo(priv->ppe, foo_config);
>
> This is a somewhat more high-level abstraction that syscon, which
> just gives you a 'struct regmap' structure for doing register-level
> configuration like you have today.

Thanks
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 148+ messages in thread

* [PATCH 3/3] net: hisilicon: new hip04 ethernet driver
@ 2014-03-21  5:19               ` Zhangfei Gao
  0 siblings, 0 replies; 148+ messages in thread
From: Zhangfei Gao @ 2014-03-21  5:19 UTC (permalink / raw)
  To: linux-arm-kernel

Dear Arnd

On Thu, Mar 20, 2014 at 10:31 PM, Arnd Bergmann <arnd@arndb.de> wrote:
>> > This also seems to encode knowledge about a particular implementation
>> > into the driver. Maybe it's better to add a property for the port
>> > mode?
>>
>> After check Documentation/devicetree/bindings/net/ethernet.txt,
>> I think phy-mode is more suitable here.
>>
>>         switch (priv->phy_mode) {
>>         case PHY_INTERFACE_MODE_SGMII:
>>                 if (speed == SPEED_1000)
>>                         val = 8;
>>                 else
>>                         val = 7;
>>                 break;
>>         case PHY_INTERFACE_MODE_MII:
>>                 val = 1;        /* SPEED_100 */
>>                 break;
>>         default:
>>                 val = 0;
>>                 break;
>>         }
>>         writel_relaxed(val, priv->base + GE_PORT_MODE);
>>
>> probe:
>> priv->phy_mode = of_get_phy_mode(node);
>
> Yes, this looks better, but where does 'speed' come from now? I assume
> even in SGMII mode, you should allow autonegotiation and set this correctly
> from the PHY code. Is that what you do here?

Yes, for the SGMII, there is the auto negotiation.
'speed' coming from adjust_link.
While the MII mode, I will directly set 100M at probe, since no auto
negotiation.

>
>> >> +static void hip04_set_xmit_desc(struct hip04_priv *priv, dma_addr_t phys)
>> >> +{
>> >> +     writel_relaxed(phys, priv->base + PPE_CFG_TX_PKT_BD_ADDR);
>> >> +}
>> >> +
>> >> +static void hip04_set_recv_desc(struct hip04_priv *priv, dma_addr_t phys)
>> >> +{
>> >> +     writel_relaxed(phys, ppebase + priv->port * 4 + PPE_CFG_RX_CFF_ADDR);
>> >> +}
>> >> +
>> >> +static u32 hip04_recv_cnt(struct hip04_priv *priv)
>> >> +{
>> >> +     return readl_relaxed(priv->base + PPE_HIS_RX_PKT_CNT);
>> >> +}
>> >
>> > At the very least, the hip04_set_xmit_desc() function needs to use 'writel'
>> > rather than 'writel_relaxed'. Otherwise data that is being sent out
>> > can be stuck in the CPU's write buffers and you send stale data on the wire.
>> >
>> > For the receive path, you may or may not need to use 'readl', depending
>> > on how DMA is serialized by this device. If you have MSI interrupts, the
>> > interrupt message should already do the serialization, but if you have
>> > edge or level triggered interrupts, you normally need to have one readl()
>> > from the device register between the IRQ and the data access.
>>
>> Really, will update to readl/wirtel in xmit and hip04_rx_poll.
>> I just got impression, use *_relaxed as much as possible for better performance.
>
> Ok. The _relaxed() versions are really meant for people that understand
> the ordering requirements. The regular readl/writel accessors contain
> extra barriers to make them equivalent to what x86 does.

Thanks for the knowledge.
So generally readl/writel should be put in critical process, where we
want the immediate result,
like irq handler?

>
>> >> +static int hip04_rx_poll(struct napi_struct *napi, int budget)
>> >> +{
>> >> +     struct hip04_priv *priv = container_of(napi,
>> >> +                           struct hip04_priv, napi);
>> >> +     struct net_device *ndev = priv->ndev;
>> >> +     struct sk_buff *skb;
>> >> +     struct rx_desc *desc;
>> >> +     unsigned char *buf;
>> >> +     int rx = 0;
>> >> +     unsigned int cnt = hip04_recv_cnt(priv);
>> >> +     unsigned int len, tmp[16];
>> >> +
>> >> +     while (cnt) {
>> >> +             buf = priv->rx_buf[priv->rx_head];
>> >> +             skb = build_skb(buf, priv->rx_buf_size);
>> >> +             if (unlikely(!skb))
>> >> +                     net_dbg_ratelimited("build_skb failed\n");
>> >> +             dma_map_single(&ndev->dev, skb->data,
>> >> +                     RX_BUF_SIZE, DMA_FROM_DEVICE);
>> >> +             memcpy(tmp, skb->data, 64);
>> >> +             endian_change((void *)tmp, 64);
>> >> +             desc = (struct rx_desc *)tmp;
>> >> +             len = desc->pkt_len;
>> >
>> > The dma_map_single() seems misplaced here, for all I can tell, the
>> > data has already been transferred. Maybe you mean dma_unmap_single?
>> >
>> > I don't see why you copy 64 bytes out of the buffer using endian_change,
>> > rather than just looking at the first word, which seems to have the
>> > only value you are interested in.
>> Russell suggested using be16_to_cpu etc to replace memcpy.
>
> Right, but doesn't it have to be be32_to_cpu?

The reason I got frustrated before is there are u16 and u32 in the structure.
While Russell reminded changing the structure layout.
So switching u16 & be16_to_cpu, and using be32_to_cpu for u32 works out.

>
>> >> +static int hip04_alloc_ring(struct net_device *ndev, struct device *d)
>> >> +{
>> >> +     struct hip04_priv *priv = netdev_priv(ndev);
>> >> +     int i;
>> >> +
>> >> +     priv->rx_buf_size = RX_BUF_SIZE +
>> >> +                         SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
>> >> +
>> >> +     priv->desc_pool = dma_pool_create(DRV_NAME, d, sizeof(struct tx_desc),
>> >> +                             SKB_DATA_ALIGN(sizeof(struct tx_desc)), 0);
>> >> +     if (!priv->desc_pool)
>> >> +             return -ENOMEM;
>> >> +
>> >> +     for (i = 0; i < TX_DESC_NUM; i++) {
>> >> +             priv->td_ring[i] = dma_pool_alloc(priv->desc_pool,
>> >> +                                     GFP_ATOMIC, &priv->td_phys[i]);
>> >> +             if (!priv->td_ring[i])
>> >> +                     return -ENOMEM;
>> >> +     }
>> >
>> > Why do you create a dma pool here, when you do all the allocations upfront?
>> >
>> > It looks to me like you could simply turn the td_ring array of pointers
>> > to tx descriptors into a an array of tx descriptors (no pointers) and allocate
>> > that one using dma_alloc_coherent.
>>
>> dma pool used here mainly because of alignment,
>> the desc has requirement of SKB_DATA_ALIGN,
>> so use the simplest way
>>         priv->desc_pool = dma_pool_create(DRV_NAME, d, sizeof(struct tx_desc),
>>                                 SKB_DATA_ALIGN(sizeof(struct tx_desc)), 0);
>
> dma_alloc_coherent() will actually give you PAGE_SIZE alignment, so that's
> still easier.
However since the alignment requirement, it can not simply use desc[i]
to get each desc.
desc = dma_alloc_coherent(d, size, &phys, GFP_KERNEL);
desc[i] is not what we want.
So still prefer using dma_pool_alloc here.
>
>> >
>> >
>> >> +     if (!ppebase) {
>> >> +             struct device_node *n;
>> >> +
>> >> +             n = of_find_compatible_node(NULL, NULL, "hisilicon,hip04-ppebase");
>> >> +             if (!n) {
>> >> +                     ret = -EINVAL;
>> >> +                     netdev_err(ndev, "not find hisilicon,ppebase\n");
>> >> +                     goto init_fail;
>> >> +             }
>> >> +             ppebase = of_iomap(n, 0);
>> >> +     }
>> >
>> > How about using syscon_regmap_lookup_by_phandle() here? That way, you can have
>> > a more generic abstraction of the ppe, and stick the port and id in there as
>> > well, e.g.
>> >
>> >         ppe-syscon = <&hip04ppe 1 4>; // ppe, port, id
>>
>> To be honest, I am not familiar with syscon_regmap_lookup_by_phandle.
>>
>> Florian has suggested
>>               ppe: ppe at 28c0000 {
>>                         compatible = "hisilicon,hip04-ppe";
>>                         reg = <0x28c0000 0x10000>;
>>                         #address-cells = <1>;
>>                         #size-cells = <0>;
>>
>>                         eth0_port: port at 1f {
>>                                 reg = <31>;
>
>                 minor comment: I'd use 0x1f for the reg too.
>
>>                         };
>>
>>                         eth1_port: port at 0 {
>>                                 reg = <0>;
>>                         };
>>
>>                         eth2_port: port at 8 {
>>                                 reg = <8>;
>>                         };
>>                 };
>>                 fe: ethernet at 28b0000 {
>>                         compatible = "hisilicon,hip04-mac";
>>                         reg = <0x28b0000 0x10000>;
>>                         interrupts = <0 413 4>;
>>                         phy-mode = "mii";
>>                         port-handle = <&eth0_port>;
>>                 };
>> And the port info can be found from port-handle
>> n = of_parse_phandle(node, "port-handle", 0);
>> ret = of_property_read_u32(n, "reg", &priv->port);
>>
>> The id is the controller start channel in ppe, either use alias or
>> another property in the port.
>
> Yes, this seems fine as well, as long as you are sure that the ppe
> device is only used by hip04 ethernet devices, I think it
> would get messy if your hardware shares them with other units.

Checked syscon_regmap, looks it specially useful for the accessing
common registers.
Here ppe is ethernet "accelerator", used specifically in this driver.
Since here only have several places access the ppebase, it maybe
simpler access directly.

>
> It's probably a little simpler to avoid the sub-nodes and instead do
>
>>               ppe: ppe at 28c0000 {
>>                         compatible = "hisilicon,hip04-ppe";
>>                         reg = <0x28c0000 0x10000>;
>>                         #address-cells = <1>;
>>                         #size-cells = <0>;
>>                 };
>>                 fe: ethernet at 28b0000 {
>>                         compatible = "hisilicon,hip04-mac";
>>                         reg = <0x28b0000 0x10000>;
>>                         interrupts = <0 413 4>;
>>                         phy-mode = "mii";
>>                         port-handle = <&ppe 31>;
>>                 };
>
> In the code, I would create a simple ppe driver that exports
> a few functions. you need. In the ethernet driver probe() function,
> you should get a handle to the ppe using
>
>         /* look up "port-handle" property of the current device, find ppe and port */
>         struct hip04_ppe *ppe = hip04_ppe_get(dev->of_node);
>         if (IS_ERR(ppe))
>                 return PTR_ERR(ptr); /* this handles -EPROBE_DEFER */
>
> and then in other code you can just do
>
>         hip04_ppe_set_foo(priv->ppe, foo_config);
>
> This is a somewhat more high-level abstraction that syscon, which
> just gives you a 'struct regmap' structure for doing register-level
> configuration like you have today.

Thanks

^ permalink raw reply	[flat|nested] 148+ messages in thread

* Re: [PATCH 2/3] net: hisilicon: new hip04 MDIO driver
  2014-03-20 17:59           ` Florian Fainelli
@ 2014-03-21  5:27             ` Zhangfei Gao
  -1 siblings, 0 replies; 148+ messages in thread
From: Zhangfei Gao @ 2014-03-21  5:27 UTC (permalink / raw)
  To: Florian Fainelli
  Cc: Zhangfei Gao, netdev, David S. Miller, linux-arm-kernel, devicetree

On Fri, Mar 21, 2014 at 1:59 AM, Florian Fainelli <f.fainelli@gmail.com> wrote:
> 2014-03-20 3:53 GMT-07:00 Zhangfei Gao <zhangfei.gao@gmail.com>:
>>>> +static int hip04_mdio_reset(struct mii_bus *bus)
>>>> +{
>>>> +       int temp, err, i;
>>>> +
>>>> +       for (i = 0; i < 2; i++) {
>>>> +               hip04_mdio_write(bus, i, 22, 0);
>>>> +               temp = hip04_mdio_read(bus, i, MII_BMCR);
>>>> +               temp |= BMCR_RESET;
>>>> +               err = hip04_mdio_write(bus, i, MII_BMCR, temp);
>>>> +               if (err < 0)
>>>> +                       return err;
>>>> +       }
>>>> +
>>>> +       mdelay(500);
>>>
>>> This does not look correct, you should iterate over all possible PHYs:
>>> PHY_MAX_ADDR instead of hardcoding the loop to 2.
>>
>> OK, got it.
>> Use 2 is since only have 2 phy in the board, will use PHY_MAX_ADDR instead.
>>>
>>> I think we might want to remove the mdio bus reset callback in general
>>> as the PHY library should already take care of software resetting the
>>> PHY to put it in a sane state, as well as waiting for the appropriate
>>> delay before using, unlike here, where you do not poll for BMCR_RESET
>>> to be cleared by the PHY.
>>>
>> Do you mean will move BMCR_RESET to common code, that's would be great.
>> The mdio_reset is added here to get phy_id, otherwise the phy_id can
>> not be read as well as detection.
>
> Oh I see, thanks for mentioning that, this is not properly covered
> here today as we need to get the PHY id before we assign it a
> phy_device structure, let me cook a patch for this.
> --

Should I keep hip04_mdio_reset in the next submit or remove it directly?

Thanks

^ permalink raw reply	[flat|nested] 148+ messages in thread

* [PATCH 2/3] net: hisilicon: new hip04 MDIO driver
@ 2014-03-21  5:27             ` Zhangfei Gao
  0 siblings, 0 replies; 148+ messages in thread
From: Zhangfei Gao @ 2014-03-21  5:27 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, Mar 21, 2014 at 1:59 AM, Florian Fainelli <f.fainelli@gmail.com> wrote:
> 2014-03-20 3:53 GMT-07:00 Zhangfei Gao <zhangfei.gao@gmail.com>:
>>>> +static int hip04_mdio_reset(struct mii_bus *bus)
>>>> +{
>>>> +       int temp, err, i;
>>>> +
>>>> +       for (i = 0; i < 2; i++) {
>>>> +               hip04_mdio_write(bus, i, 22, 0);
>>>> +               temp = hip04_mdio_read(bus, i, MII_BMCR);
>>>> +               temp |= BMCR_RESET;
>>>> +               err = hip04_mdio_write(bus, i, MII_BMCR, temp);
>>>> +               if (err < 0)
>>>> +                       return err;
>>>> +       }
>>>> +
>>>> +       mdelay(500);
>>>
>>> This does not look correct, you should iterate over all possible PHYs:
>>> PHY_MAX_ADDR instead of hardcoding the loop to 2.
>>
>> OK, got it.
>> Use 2 is since only have 2 phy in the board, will use PHY_MAX_ADDR instead.
>>>
>>> I think we might want to remove the mdio bus reset callback in general
>>> as the PHY library should already take care of software resetting the
>>> PHY to put it in a sane state, as well as waiting for the appropriate
>>> delay before using, unlike here, where you do not poll for BMCR_RESET
>>> to be cleared by the PHY.
>>>
>> Do you mean will move BMCR_RESET to common code, that's would be great.
>> The mdio_reset is added here to get phy_id, otherwise the phy_id can
>> not be read as well as detection.
>
> Oh I see, thanks for mentioning that, this is not properly covered
> here today as we need to get the PHY id before we assign it a
> phy_device structure, let me cook a patch for this.
> --

Should I keep hip04_mdio_reset in the next submit or remove it directly?

Thanks

^ permalink raw reply	[flat|nested] 148+ messages in thread

* Re: [PATCH 3/3] net: hisilicon: new hip04 ethernet driver
  2014-03-21  5:19               ` Zhangfei Gao
@ 2014-03-21  7:37                 ` Arnd Bergmann
  -1 siblings, 0 replies; 148+ messages in thread
From: Arnd Bergmann @ 2014-03-21  7:37 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: Zhangfei Gao, Zhangfei Gao, devicetree, David S. Miller, netdev

On Friday 21 March 2014 13:19:07 Zhangfei Gao wrote:
> On Thu, Mar 20, 2014 at 10:31 PM, Arnd Bergmann <arnd@arndb.de> wrote:
>
> > Yes, this looks better, but where does 'speed' come from now? I assume
> > even in SGMII mode, you should allow autonegotiation and set this correctly
> > from the PHY code. Is that what you do here?
> 
> Yes, for the SGMII, there is the auto negotiation.
> 'speed' coming from adjust_link.
> While the MII mode, I will directly set 100M at probe, since no auto
> negotiation.

Ok, good.

> >
> >> >> +static void hip04_set_xmit_desc(struct hip04_priv *priv, dma_addr_t phys)
> >> >> +{
> >> >> +     writel_relaxed(phys, priv->base + PPE_CFG_TX_PKT_BD_ADDR);
> >> >> +}
> >> >> +
> >> >> +static void hip04_set_recv_desc(struct hip04_priv *priv, dma_addr_t phys)
> >> >> +{
> >> >> +     writel_relaxed(phys, ppebase + priv->port * 4 + PPE_CFG_RX_CFF_ADDR);
> >> >> +}
> >> >> +
> >> >> +static u32 hip04_recv_cnt(struct hip04_priv *priv)
> >> >> +{
> >> >> +     return readl_relaxed(priv->base + PPE_HIS_RX_PKT_CNT);
> >> >> +}
> >> >
> >> > At the very least, the hip04_set_xmit_desc() function needs to use 'writel'
> >> > rather than 'writel_relaxed'. Otherwise data that is being sent out
> >> > can be stuck in the CPU's write buffers and you send stale data on the wire.
> >> >
> >> > For the receive path, you may or may not need to use 'readl', depending
> >> > on how DMA is serialized by this device. If you have MSI interrupts, the
> >> > interrupt message should already do the serialization, but if you have
> >> > edge or level triggered interrupts, you normally need to have one readl()
> >> > from the device register between the IRQ and the data access.
> >>
> >> Really, will update to readl/wirtel in xmit and hip04_rx_poll.
> >> I just got impression, use *_relaxed as much as possible for better performance.
> >
> > Ok. The _relaxed() versions are really meant for people that understand
> > the ordering requirements. The regular readl/writel accessors contain
> > extra barriers to make them equivalent to what x86 does.
> 
> Thanks for the knowledge.
> So generally readl/writel should be put in critical process, where we
> want the immediate result,
> like irq handler?

Strictly speaking you only need one writel after preparing an
outbound DMA buffer, and and one readl before reading an inbound
DMA buffer.

> >> >> +static int hip04_rx_poll(struct napi_struct *napi, int budget)
> >> >> +{
> >> >> +     struct hip04_priv *priv = container_of(napi,
> >> >> +                           struct hip04_priv, napi);
> >> >> +     struct net_device *ndev = priv->ndev;
> >> >> +     struct sk_buff *skb;
> >> >> +     struct rx_desc *desc;
> >> >> +     unsigned char *buf;
> >> >> +     int rx = 0;
> >> >> +     unsigned int cnt = hip04_recv_cnt(priv);
> >> >> +     unsigned int len, tmp[16];
> >> >> +
> >> >> +     while (cnt) {
> >> >> +             buf = priv->rx_buf[priv->rx_head];
> >> >> +             skb = build_skb(buf, priv->rx_buf_size);
> >> >> +             if (unlikely(!skb))
> >> >> +                     net_dbg_ratelimited("build_skb failed\n");
> >> >> +             dma_map_single(&ndev->dev, skb->data,
> >> >> +                     RX_BUF_SIZE, DMA_FROM_DEVICE);
> >> >> +             memcpy(tmp, skb->data, 64);
> >> >> +             endian_change((void *)tmp, 64);
> >> >> +             desc = (struct rx_desc *)tmp;
> >> >> +             len = desc->pkt_len;
> >> >
> >> > The dma_map_single() seems misplaced here, for all I can tell, the
> >> > data has already been transferred. Maybe you mean dma_unmap_single?
> >> >
> >> > I don't see why you copy 64 bytes out of the buffer using endian_change,
> >> > rather than just looking at the first word, which seems to have the
> >> > only value you are interested in.
> >> Russell suggested using be16_to_cpu etc to replace memcpy.
> >
> > Right, but doesn't it have to be be32_to_cpu?
> 
> The reason I got frustrated before is there are u16 and u32 in the structure.
> While Russell reminded changing the structure layout.
> So switching u16 & be16_to_cpu, and using be32_to_cpu for u32 works out.

Ok. The endian_change() function did multiple 32-bit swaps, so I assumed
that was what the design required. Swapping each field individually with
its actual length is more sensible though, so if that works, it is probably
the correct solution and the endian_change() function was in fact wrong.

> >> >> +static int hip04_alloc_ring(struct net_device *ndev, struct device *d)
> >> >> +{
> >> >> +     struct hip04_priv *priv = netdev_priv(ndev);
> >> >> +     int i;
> >> >> +
> >> >> +     priv->rx_buf_size = RX_BUF_SIZE +
> >> >> +                         SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
> >> >> +
> >> >> +     priv->desc_pool = dma_pool_create(DRV_NAME, d, sizeof(struct tx_desc),
> >> >> +                             SKB_DATA_ALIGN(sizeof(struct tx_desc)), 0);
> >> >> +     if (!priv->desc_pool)
> >> >> +             return -ENOMEM;
> >> >> +
> >> >> +     for (i = 0; i < TX_DESC_NUM; i++) {
> >> >> +             priv->td_ring[i] = dma_pool_alloc(priv->desc_pool,
> >> >> +                                     GFP_ATOMIC, &priv->td_phys[i]);
> >> >> +             if (!priv->td_ring[i])
> >> >> +                     return -ENOMEM;
> >> >> +     }
> >> >
> >> > Why do you create a dma pool here, when you do all the allocations upfront?
> >> >
> >> > It looks to me like you could simply turn the td_ring array of pointers
> >> > to tx descriptors into a an array of tx descriptors (no pointers) and allocate
> >> > that one using dma_alloc_coherent.
> >>
> >> dma pool used here mainly because of alignment,
> >> the desc has requirement of SKB_DATA_ALIGN,
> >> so use the simplest way
> >>         priv->desc_pool = dma_pool_create(DRV_NAME, d, sizeof(struct tx_desc),
> >>                                 SKB_DATA_ALIGN(sizeof(struct tx_desc)), 0);
> >
> > dma_alloc_coherent() will actually give you PAGE_SIZE alignment, so that's
> > still easier.
> However since the alignment requirement, it can not simply use desc[i]
> to get each desc.
> desc = dma_alloc_coherent(d, size, &phys, GFP_KERNEL);
> desc[i] is not what we want.
> So still prefer using dma_pool_alloc here.

Ah, I see what you mean: struct tx_desc is actually smaller than the
required alignment. You can fix that by marking the structure
"____cacheline_aligned". 

> >>                         };
> >>
> >>                         eth1_port: port@0 {
> >>                                 reg = <0>;
> >>                         };
> >>
> >>                         eth2_port: port@8 {
> >>                                 reg = <8>;
> >>                         };
> >>                 };
> >>                 fe: ethernet@28b0000 {
> >>                         compatible = "hisilicon,hip04-mac";
> >>                         reg = <0x28b0000 0x10000>;
> >>                         interrupts = <0 413 4>;
> >>                         phy-mode = "mii";
> >>                         port-handle = <&eth0_port>;
> >>                 };
> >> And the port info can be found from port-handle
> >> n = of_parse_phandle(node, "port-handle", 0);
> >> ret = of_property_read_u32(n, "reg", &priv->port);
> >>
> >> The id is the controller start channel in ppe, either use alias or
> >> another property in the port.
> >
> > Yes, this seems fine as well, as long as you are sure that the ppe
> > device is only used by hip04 ethernet devices, I think it
> > would get messy if your hardware shares them with other units.
> 
> Checked syscon_regmap, looks it specially useful for the accessing
> common registers.
> Here ppe is ethernet "accelerator", used specifically in this driver.
> Since here only have several places access the ppebase, it maybe
> simpler access directly.

Right. I was just bringing it up because some similar designs
(e.g. TI keystone or APM x-gene) share an accelarator between
multiple users, not just the ethernet devices, and in that case
you might need a more general solution.

	Arnd

^ permalink raw reply	[flat|nested] 148+ messages in thread

* [PATCH 3/3] net: hisilicon: new hip04 ethernet driver
@ 2014-03-21  7:37                 ` Arnd Bergmann
  0 siblings, 0 replies; 148+ messages in thread
From: Arnd Bergmann @ 2014-03-21  7:37 UTC (permalink / raw)
  To: linux-arm-kernel

On Friday 21 March 2014 13:19:07 Zhangfei Gao wrote:
> On Thu, Mar 20, 2014 at 10:31 PM, Arnd Bergmann <arnd@arndb.de> wrote:
>
> > Yes, this looks better, but where does 'speed' come from now? I assume
> > even in SGMII mode, you should allow autonegotiation and set this correctly
> > from the PHY code. Is that what you do here?
> 
> Yes, for the SGMII, there is the auto negotiation.
> 'speed' coming from adjust_link.
> While the MII mode, I will directly set 100M at probe, since no auto
> negotiation.

Ok, good.

> >
> >> >> +static void hip04_set_xmit_desc(struct hip04_priv *priv, dma_addr_t phys)
> >> >> +{
> >> >> +     writel_relaxed(phys, priv->base + PPE_CFG_TX_PKT_BD_ADDR);
> >> >> +}
> >> >> +
> >> >> +static void hip04_set_recv_desc(struct hip04_priv *priv, dma_addr_t phys)
> >> >> +{
> >> >> +     writel_relaxed(phys, ppebase + priv->port * 4 + PPE_CFG_RX_CFF_ADDR);
> >> >> +}
> >> >> +
> >> >> +static u32 hip04_recv_cnt(struct hip04_priv *priv)
> >> >> +{
> >> >> +     return readl_relaxed(priv->base + PPE_HIS_RX_PKT_CNT);
> >> >> +}
> >> >
> >> > At the very least, the hip04_set_xmit_desc() function needs to use 'writel'
> >> > rather than 'writel_relaxed'. Otherwise data that is being sent out
> >> > can be stuck in the CPU's write buffers and you send stale data on the wire.
> >> >
> >> > For the receive path, you may or may not need to use 'readl', depending
> >> > on how DMA is serialized by this device. If you have MSI interrupts, the
> >> > interrupt message should already do the serialization, but if you have
> >> > edge or level triggered interrupts, you normally need to have one readl()
> >> > from the device register between the IRQ and the data access.
> >>
> >> Really, will update to readl/wirtel in xmit and hip04_rx_poll.
> >> I just got impression, use *_relaxed as much as possible for better performance.
> >
> > Ok. The _relaxed() versions are really meant for people that understand
> > the ordering requirements. The regular readl/writel accessors contain
> > extra barriers to make them equivalent to what x86 does.
> 
> Thanks for the knowledge.
> So generally readl/writel should be put in critical process, where we
> want the immediate result,
> like irq handler?

Strictly speaking you only need one writel after preparing an
outbound DMA buffer, and and one readl before reading an inbound
DMA buffer.

> >> >> +static int hip04_rx_poll(struct napi_struct *napi, int budget)
> >> >> +{
> >> >> +     struct hip04_priv *priv = container_of(napi,
> >> >> +                           struct hip04_priv, napi);
> >> >> +     struct net_device *ndev = priv->ndev;
> >> >> +     struct sk_buff *skb;
> >> >> +     struct rx_desc *desc;
> >> >> +     unsigned char *buf;
> >> >> +     int rx = 0;
> >> >> +     unsigned int cnt = hip04_recv_cnt(priv);
> >> >> +     unsigned int len, tmp[16];
> >> >> +
> >> >> +     while (cnt) {
> >> >> +             buf = priv->rx_buf[priv->rx_head];
> >> >> +             skb = build_skb(buf, priv->rx_buf_size);
> >> >> +             if (unlikely(!skb))
> >> >> +                     net_dbg_ratelimited("build_skb failed\n");
> >> >> +             dma_map_single(&ndev->dev, skb->data,
> >> >> +                     RX_BUF_SIZE, DMA_FROM_DEVICE);
> >> >> +             memcpy(tmp, skb->data, 64);
> >> >> +             endian_change((void *)tmp, 64);
> >> >> +             desc = (struct rx_desc *)tmp;
> >> >> +             len = desc->pkt_len;
> >> >
> >> > The dma_map_single() seems misplaced here, for all I can tell, the
> >> > data has already been transferred. Maybe you mean dma_unmap_single?
> >> >
> >> > I don't see why you copy 64 bytes out of the buffer using endian_change,
> >> > rather than just looking at the first word, which seems to have the
> >> > only value you are interested in.
> >> Russell suggested using be16_to_cpu etc to replace memcpy.
> >
> > Right, but doesn't it have to be be32_to_cpu?
> 
> The reason I got frustrated before is there are u16 and u32 in the structure.
> While Russell reminded changing the structure layout.
> So switching u16 & be16_to_cpu, and using be32_to_cpu for u32 works out.

Ok. The endian_change() function did multiple 32-bit swaps, so I assumed
that was what the design required. Swapping each field individually with
its actual length is more sensible though, so if that works, it is probably
the correct solution and the endian_change() function was in fact wrong.

> >> >> +static int hip04_alloc_ring(struct net_device *ndev, struct device *d)
> >> >> +{
> >> >> +     struct hip04_priv *priv = netdev_priv(ndev);
> >> >> +     int i;
> >> >> +
> >> >> +     priv->rx_buf_size = RX_BUF_SIZE +
> >> >> +                         SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
> >> >> +
> >> >> +     priv->desc_pool = dma_pool_create(DRV_NAME, d, sizeof(struct tx_desc),
> >> >> +                             SKB_DATA_ALIGN(sizeof(struct tx_desc)), 0);
> >> >> +     if (!priv->desc_pool)
> >> >> +             return -ENOMEM;
> >> >> +
> >> >> +     for (i = 0; i < TX_DESC_NUM; i++) {
> >> >> +             priv->td_ring[i] = dma_pool_alloc(priv->desc_pool,
> >> >> +                                     GFP_ATOMIC, &priv->td_phys[i]);
> >> >> +             if (!priv->td_ring[i])
> >> >> +                     return -ENOMEM;
> >> >> +     }
> >> >
> >> > Why do you create a dma pool here, when you do all the allocations upfront?
> >> >
> >> > It looks to me like you could simply turn the td_ring array of pointers
> >> > to tx descriptors into a an array of tx descriptors (no pointers) and allocate
> >> > that one using dma_alloc_coherent.
> >>
> >> dma pool used here mainly because of alignment,
> >> the desc has requirement of SKB_DATA_ALIGN,
> >> so use the simplest way
> >>         priv->desc_pool = dma_pool_create(DRV_NAME, d, sizeof(struct tx_desc),
> >>                                 SKB_DATA_ALIGN(sizeof(struct tx_desc)), 0);
> >
> > dma_alloc_coherent() will actually give you PAGE_SIZE alignment, so that's
> > still easier.
> However since the alignment requirement, it can not simply use desc[i]
> to get each desc.
> desc = dma_alloc_coherent(d, size, &phys, GFP_KERNEL);
> desc[i] is not what we want.
> So still prefer using dma_pool_alloc here.

Ah, I see what you mean: struct tx_desc is actually smaller than the
required alignment. You can fix that by marking the structure
"____cacheline_aligned". 

> >>                         };
> >>
> >>                         eth1_port: port at 0 {
> >>                                 reg = <0>;
> >>                         };
> >>
> >>                         eth2_port: port at 8 {
> >>                                 reg = <8>;
> >>                         };
> >>                 };
> >>                 fe: ethernet at 28b0000 {
> >>                         compatible = "hisilicon,hip04-mac";
> >>                         reg = <0x28b0000 0x10000>;
> >>                         interrupts = <0 413 4>;
> >>                         phy-mode = "mii";
> >>                         port-handle = <&eth0_port>;
> >>                 };
> >> And the port info can be found from port-handle
> >> n = of_parse_phandle(node, "port-handle", 0);
> >> ret = of_property_read_u32(n, "reg", &priv->port);
> >>
> >> The id is the controller start channel in ppe, either use alias or
> >> another property in the port.
> >
> > Yes, this seems fine as well, as long as you are sure that the ppe
> > device is only used by hip04 ethernet devices, I think it
> > would get messy if your hardware shares them with other units.
> 
> Checked syscon_regmap, looks it specially useful for the accessing
> common registers.
> Here ppe is ethernet "accelerator", used specifically in this driver.
> Since here only have several places access the ppebase, it maybe
> simpler access directly.

Right. I was just bringing it up because some similar designs
(e.g. TI keystone or APM x-gene) share an accelarator between
multiple users, not just the ethernet devices, and in that case
you might need a more general solution.

	Arnd

^ permalink raw reply	[flat|nested] 148+ messages in thread

* Re: [PATCH 3/3] net: hisilicon: new hip04 ethernet driver
  2014-03-21  7:37                 ` Arnd Bergmann
@ 2014-03-21  7:56                   ` Zhangfei Gao
  -1 siblings, 0 replies; 148+ messages in thread
From: Zhangfei Gao @ 2014-03-21  7:56 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Zhangfei Gao, netdev, David S. Miller, linux-arm-kernel, devicetree

Dear Arnd

On Fri, Mar 21, 2014 at 3:37 PM, Arnd Bergmann <arnd@arndb.de> wrote:

>> >> >> +static int hip04_alloc_ring(struct net_device *ndev, struct device *d)
>> >> >> +{
>> >> >> +     struct hip04_priv *priv = netdev_priv(ndev);
>> >> >> +     int i;
>> >> >> +
>> >> >> +     priv->rx_buf_size = RX_BUF_SIZE +
>> >> >> +                         SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
>> >> >> +
>> >> >> +     priv->desc_pool = dma_pool_create(DRV_NAME, d, sizeof(struct tx_desc),
>> >> >> +                             SKB_DATA_ALIGN(sizeof(struct tx_desc)), 0);
>> >> >> +     if (!priv->desc_pool)
>> >> >> +             return -ENOMEM;
>> >> >> +
>> >> >> +     for (i = 0; i < TX_DESC_NUM; i++) {
>> >> >> +             priv->td_ring[i] = dma_pool_alloc(priv->desc_pool,
>> >> >> +                                     GFP_ATOMIC, &priv->td_phys[i]);
>> >> >> +             if (!priv->td_ring[i])
>> >> >> +                     return -ENOMEM;
>> >> >> +     }
>> >> >
>> >> > Why do you create a dma pool here, when you do all the allocations upfront?
>> >> >
>> >> > It looks to me like you could simply turn the td_ring array of pointers
>> >> > to tx descriptors into a an array of tx descriptors (no pointers) and allocate
>> >> > that one using dma_alloc_coherent.
>> >>
>> >> dma pool used here mainly because of alignment,
>> >> the desc has requirement of SKB_DATA_ALIGN,
>> >> so use the simplest way
>> >>         priv->desc_pool = dma_pool_create(DRV_NAME, d, sizeof(struct tx_desc),
>> >>                                 SKB_DATA_ALIGN(sizeof(struct tx_desc)), 0);
>> >
>> > dma_alloc_coherent() will actually give you PAGE_SIZE alignment, so that's
>> > still easier.
>> However since the alignment requirement, it can not simply use desc[i]
>> to get each desc.
>> desc = dma_alloc_coherent(d, size, &phys, GFP_KERNEL);
>> desc[i] is not what we want.
>> So still prefer using dma_pool_alloc here.
>
> Ah, I see what you mean: struct tx_desc is actually smaller than the
> required alignment. You can fix that by marking the structure
> "____cacheline_aligned".
>

Yes, after recheck the method carefully, it works with __aligned(0x40).
"____cacheline_aligned" is much better.
The desc can be get with either table or pointer.

Will update accordingly, it is simpler.

Thanks

^ permalink raw reply	[flat|nested] 148+ messages in thread

* [PATCH 3/3] net: hisilicon: new hip04 ethernet driver
@ 2014-03-21  7:56                   ` Zhangfei Gao
  0 siblings, 0 replies; 148+ messages in thread
From: Zhangfei Gao @ 2014-03-21  7:56 UTC (permalink / raw)
  To: linux-arm-kernel

Dear Arnd

On Fri, Mar 21, 2014 at 3:37 PM, Arnd Bergmann <arnd@arndb.de> wrote:

>> >> >> +static int hip04_alloc_ring(struct net_device *ndev, struct device *d)
>> >> >> +{
>> >> >> +     struct hip04_priv *priv = netdev_priv(ndev);
>> >> >> +     int i;
>> >> >> +
>> >> >> +     priv->rx_buf_size = RX_BUF_SIZE +
>> >> >> +                         SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
>> >> >> +
>> >> >> +     priv->desc_pool = dma_pool_create(DRV_NAME, d, sizeof(struct tx_desc),
>> >> >> +                             SKB_DATA_ALIGN(sizeof(struct tx_desc)), 0);
>> >> >> +     if (!priv->desc_pool)
>> >> >> +             return -ENOMEM;
>> >> >> +
>> >> >> +     for (i = 0; i < TX_DESC_NUM; i++) {
>> >> >> +             priv->td_ring[i] = dma_pool_alloc(priv->desc_pool,
>> >> >> +                                     GFP_ATOMIC, &priv->td_phys[i]);
>> >> >> +             if (!priv->td_ring[i])
>> >> >> +                     return -ENOMEM;
>> >> >> +     }
>> >> >
>> >> > Why do you create a dma pool here, when you do all the allocations upfront?
>> >> >
>> >> > It looks to me like you could simply turn the td_ring array of pointers
>> >> > to tx descriptors into a an array of tx descriptors (no pointers) and allocate
>> >> > that one using dma_alloc_coherent.
>> >>
>> >> dma pool used here mainly because of alignment,
>> >> the desc has requirement of SKB_DATA_ALIGN,
>> >> so use the simplest way
>> >>         priv->desc_pool = dma_pool_create(DRV_NAME, d, sizeof(struct tx_desc),
>> >>                                 SKB_DATA_ALIGN(sizeof(struct tx_desc)), 0);
>> >
>> > dma_alloc_coherent() will actually give you PAGE_SIZE alignment, so that's
>> > still easier.
>> However since the alignment requirement, it can not simply use desc[i]
>> to get each desc.
>> desc = dma_alloc_coherent(d, size, &phys, GFP_KERNEL);
>> desc[i] is not what we want.
>> So still prefer using dma_pool_alloc here.
>
> Ah, I see what you mean: struct tx_desc is actually smaller than the
> required alignment. You can fix that by marking the structure
> "____cacheline_aligned".
>

Yes, after recheck the method carefully, it works with __aligned(0x40).
"____cacheline_aligned" is much better.
The desc can be get with either table or pointer.

Will update accordingly, it is simpler.

Thanks

^ permalink raw reply	[flat|nested] 148+ messages in thread

* Re: [PATCH 1/3] Documentation: add Device tree bindings for Hisilicon hip04 ethernet
  2014-03-18 12:34     ` Mark Rutland
@ 2014-03-21 12:59         ` Zhangfei Gao
  -1 siblings, 0 replies; 148+ messages in thread
From: Zhangfei Gao @ 2014-03-21 12:59 UTC (permalink / raw)
  To: Mark Rutland
  Cc: Zhangfei Gao, netdev-u79uwXL29TY76Z2rM5mHXA, David S. Miller,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	devicetree-u79uwXL29TY76Z2rM5mHXA

Dear Mark

Thanks for the suggestion, will update accordingly.

On Tue, Mar 18, 2014 at 8:34 PM, Mark Rutland <mark.rutland-5wv7dgnIgG8@public.gmane.org> wrote:
> On Tue, Mar 18, 2014 at 08:40:15AM +0000, Zhangfei Gao wrote:
>> This patch adds the Device Tree bindings for the Hisilicon hip04
>> Ethernet controller, including 100M / 1000M controller.
>>
>> Signed-off-by: Zhangfei Gao <zhangfei.gao-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org>
>> ---
>>  .../bindings/net/hisilicon-hip04-net.txt           |   74 ++++++++++++++++++++
>>  1 file changed, 74 insertions(+)
>>  create mode 100644 Documentation/devicetree/bindings/net/hisilicon-hip04-net.txt
>>
>> diff --git a/Documentation/devicetree/bindings/net/hisilicon-hip04-net.txt b/Documentation/devicetree/bindings/net/hisilicon-hip04-net.txt
>> new file mode 100644
>> index 0000000..c918f08
>> --- /dev/null
>> +++ b/Documentation/devicetree/bindings/net/hisilicon-hip04-net.txt
>> @@ -0,0 +1,74 @@
>> +Hisilicon hip04 Ethernet Controller
>> +
>> +* Ethernet controller node
>> +
>> +Required properties:
>> +- compatible: should be "hisilicon,hip04-mac".
>> +- reg: address and length of the register set for the device.
>> +- interrupts: interrupt for the device.
>> +- port: ppe port number connected to the controller: range from 0 to 31.
>
> ppe?
>
> Will there ever be more than one ppe? If so, describing the linkage to
> the ppe with a phandle + args approach is preferable.
>
>> +- speed: 100 (100M) or 1000 (1000M).
>
> Can you not query this from the hardware?

Will remove speed.
>
>> +- id: should be different and fe should be 0.
>
> This description is useless.
>
> What is this for, and why does this need to be in the dt? What is "fe"?

Use alias instead.
>
>> +
>> +Optional Properties:
>> +- phy-handle : the phandle to a PHY node
>> +
>> +
>> +* Ethernet ppe node: control rx & tx fifos of all ethernet controllers
>> +
>> +Required properties:
>> +- compatible: should be "hisilicon,hip04-ppebase".
>
> Why "ppebase" rather than "ppe"?
>
>> +- reg: address and length of the register set for the node.
>
> s/node/device/

OK.

Thanks
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 148+ messages in thread

* [PATCH 1/3] Documentation: add Device tree bindings for Hisilicon hip04 ethernet
@ 2014-03-21 12:59         ` Zhangfei Gao
  0 siblings, 0 replies; 148+ messages in thread
From: Zhangfei Gao @ 2014-03-21 12:59 UTC (permalink / raw)
  To: linux-arm-kernel

Dear Mark

Thanks for the suggestion, will update accordingly.

On Tue, Mar 18, 2014 at 8:34 PM, Mark Rutland <mark.rutland@arm.com> wrote:
> On Tue, Mar 18, 2014 at 08:40:15AM +0000, Zhangfei Gao wrote:
>> This patch adds the Device Tree bindings for the Hisilicon hip04
>> Ethernet controller, including 100M / 1000M controller.
>>
>> Signed-off-by: Zhangfei Gao <zhangfei.gao@linaro.org>
>> ---
>>  .../bindings/net/hisilicon-hip04-net.txt           |   74 ++++++++++++++++++++
>>  1 file changed, 74 insertions(+)
>>  create mode 100644 Documentation/devicetree/bindings/net/hisilicon-hip04-net.txt
>>
>> diff --git a/Documentation/devicetree/bindings/net/hisilicon-hip04-net.txt b/Documentation/devicetree/bindings/net/hisilicon-hip04-net.txt
>> new file mode 100644
>> index 0000000..c918f08
>> --- /dev/null
>> +++ b/Documentation/devicetree/bindings/net/hisilicon-hip04-net.txt
>> @@ -0,0 +1,74 @@
>> +Hisilicon hip04 Ethernet Controller
>> +
>> +* Ethernet controller node
>> +
>> +Required properties:
>> +- compatible: should be "hisilicon,hip04-mac".
>> +- reg: address and length of the register set for the device.
>> +- interrupts: interrupt for the device.
>> +- port: ppe port number connected to the controller: range from 0 to 31.
>
> ppe?
>
> Will there ever be more than one ppe? If so, describing the linkage to
> the ppe with a phandle + args approach is preferable.
>
>> +- speed: 100 (100M) or 1000 (1000M).
>
> Can you not query this from the hardware?

Will remove speed.
>
>> +- id: should be different and fe should be 0.
>
> This description is useless.
>
> What is this for, and why does this need to be in the dt? What is "fe"?

Use alias instead.
>
>> +
>> +Optional Properties:
>> +- phy-handle : the phandle to a PHY node
>> +
>> +
>> +* Ethernet ppe node: control rx & tx fifos of all ethernet controllers
>> +
>> +Required properties:
>> +- compatible: should be "hisilicon,hip04-ppebase".
>
> Why "ppebase" rather than "ppe"?
>
>> +- reg: address and length of the register set for the node.
>
> s/node/device/

OK.

Thanks

^ permalink raw reply	[flat|nested] 148+ messages in thread

* Re: [PATCH 1/3] Documentation: add Device tree bindings for Hisilicon hip04 ethernet
  2014-03-18 12:51     ` Sergei Shtylyov
@ 2014-03-21 13:04       ` Zhangfei Gao
  -1 siblings, 0 replies; 148+ messages in thread
From: Zhangfei Gao @ 2014-03-21 13:04 UTC (permalink / raw)
  To: Sergei Shtylyov
  Cc: Zhangfei Gao, David S. Miller, netdev, linux-arm-kernel, devicetree

Dear Sergei

Thanks for the kind suggestion, will update.

On Tue, Mar 18, 2014 at 8:51 PM, Sergei Shtylyov
<sergei.shtylyov@cogentembedded.com> wrote:

>> diff --git a/Documentation/devicetree/bindings/net/hisilicon-hip04-net.txt
>> b/Documentation/devicetree/bindings/net/hisilicon-hip04-net.txt
>> new file mode 100644
>> index 0000000..c918f08
>> --- /dev/null
>> +++ b/Documentation/devicetree/bindings/net/hisilicon-hip04-net.txt
>> @@ -0,0 +1,74 @@
>> +Hisilicon hip04 Ethernet Controller
>> +
>> +* Ethernet controller node
>> +
>> +Required properties:
>> +- compatible: should be "hisilicon,hip04-mac".
>> +- reg: address and length of the register set for the device.
>> +- interrupts: interrupt for the device.
>> +- port: ppe port number connected to the controller: range from 0 to 31.
>> +- speed: 100 (100M) or 1000 (1000M).
>
>
>    There's standard "max-speed" property for that, see
> Documentation/devicetree/bindings/net/ethernet.txt in the 'net-next.git'
> repo.
>
>
>> +Optional Properties:
>> +- phy-handle : the phandle to a PHY node
>
>
>    Please refer instead to the above mentioned file for this standard
> property -- it is already described there. See other binding files as the
> example.

Yes, Documentation/devicetree/bindings/net/ethernet.txt has defined
many useful node.
Will refer it directly.

Thanks

^ permalink raw reply	[flat|nested] 148+ messages in thread

* [PATCH 1/3] Documentation: add Device tree bindings for Hisilicon hip04 ethernet
@ 2014-03-21 13:04       ` Zhangfei Gao
  0 siblings, 0 replies; 148+ messages in thread
From: Zhangfei Gao @ 2014-03-21 13:04 UTC (permalink / raw)
  To: linux-arm-kernel

Dear Sergei

Thanks for the kind suggestion, will update.

On Tue, Mar 18, 2014 at 8:51 PM, Sergei Shtylyov
<sergei.shtylyov@cogentembedded.com> wrote:

>> diff --git a/Documentation/devicetree/bindings/net/hisilicon-hip04-net.txt
>> b/Documentation/devicetree/bindings/net/hisilicon-hip04-net.txt
>> new file mode 100644
>> index 0000000..c918f08
>> --- /dev/null
>> +++ b/Documentation/devicetree/bindings/net/hisilicon-hip04-net.txt
>> @@ -0,0 +1,74 @@
>> +Hisilicon hip04 Ethernet Controller
>> +
>> +* Ethernet controller node
>> +
>> +Required properties:
>> +- compatible: should be "hisilicon,hip04-mac".
>> +- reg: address and length of the register set for the device.
>> +- interrupts: interrupt for the device.
>> +- port: ppe port number connected to the controller: range from 0 to 31.
>> +- speed: 100 (100M) or 1000 (1000M).
>
>
>    There's standard "max-speed" property for that, see
> Documentation/devicetree/bindings/net/ethernet.txt in the 'net-next.git'
> repo.
>
>
>> +Optional Properties:
>> +- phy-handle : the phandle to a PHY node
>
>
>    Please refer instead to the above mentioned file for this standard
> property -- it is already described there. See other binding files as the
> example.

Yes, Documentation/devicetree/bindings/net/ethernet.txt has defined
many useful node.
Will refer it directly.

Thanks

^ permalink raw reply	[flat|nested] 148+ messages in thread

* Re: [PATCH 3/3] net: hisilicon: new hip04 ethernet driver
  2014-03-20 14:31           ` Arnd Bergmann
@ 2014-03-24  8:17             ` Zhangfei Gao
  -1 siblings, 0 replies; 148+ messages in thread
From: Zhangfei Gao @ 2014-03-24  8:17 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Zhangfei Gao, netdev, David S. Miller, linux-arm-kernel, devicetree

Dear Arnd

On Thu, Mar 20, 2014 at 10:31 PM, Arnd Bergmann <arnd@arndb.de> wrote:
> On Thursday 20 March 2014, Zhangfei Gao wrote:
>> On Tue, Mar 18, 2014 at 7:25 PM, Arnd Bergmann <arnd@arndb.de> wrote:
>> > On Tuesday 18 March 2014 16:40:17 Zhangfei Gao wrote:
>> >
>> >> +
>> >> +static void __iomem *ppebase;
>> >
>> > The global 'ppebase' seems hacky. Isn't that a SoC-specific register area, while
>> > the rest of the driver is reusable across SoCs?
>> >
>> > What does 'ppe' stand for?
>> >
>> > What if there are multiple instances of this, which each have their own ppebase?
>>
>> In this specific platform,
>> ppe is the only module controlling all the fifos for all the net controller.
>> And each controller connect to specific port.
>> ppe has 2048 channels, sharing by all the controller, only if not overlapped.
>>
>> So the static ppebase is required, which I don't like too.
>> Two inputs required, one is port, which is connect to the controller.
>> The other is start channel, currently I use id, and start channel is
>> RX_DESC_NUM * priv->id;  /* start_addr */
>
> Ok, thanks for the explanation!
>
I thought you are fine with "static void __iomem *ppebase" here.

>> >
>> >> +     if (!ppebase) {
>> >> +             struct device_node *n;
>> >> +
>> >> +             n = of_find_compatible_node(NULL, NULL, "hisilicon,hip04-ppebase");
>> >> +             if (!n) {
>> >> +                     ret = -EINVAL;
>> >> +                     netdev_err(ndev, "not find hisilicon,ppebase\n");
>> >> +                     goto init_fail;
>> >> +             }
>> >> +             ppebase = of_iomap(n, 0);
>> >> +     }
>> >
>> > How about using syscon_regmap_lookup_by_phandle() here? That way, you can have
>> > a more generic abstraction of the ppe, and stick the port and id in there as
>> > well, e.g.
>> >
>> >         ppe-syscon = <&hip04ppe 1 4>; // ppe, port, id

Even if using syscon_regmap_lookup_by_phandle, there still have static
struct regmap, since three controllers
share one regmap.

> It's probably a little simpler to avoid the sub-nodes and instead do
>
>>               ppe: ppe@28c0000 {
>>                         compatible = "hisilicon,hip04-ppe";
>>                         reg = <0x28c0000 0x10000>;
>>                         #address-cells = <1>;
>>                         #size-cells = <0>;
>>                 };
>>                 fe: ethernet@28b0000 {
>>                         compatible = "hisilicon,hip04-mac";
>>                         reg = <0x28b0000 0x10000>;
>>                         interrupts = <0 413 4>;
>>                         phy-mode = "mii";
>>                         port-handle = <&ppe 31>;
>>                 };
>
> In the code, I would create a simple ppe driver that exports
> a few functions. you need. In the ethernet driver probe() function,
> you should get a handle to the ppe using
>
>         /* look up "port-handle" property of the current device, find ppe and port */
>         struct hip04_ppe *ppe = hip04_ppe_get(dev->of_node);
>         if (IS_ERR(ppe))
>                 return PTR_ERR(ptr); /* this handles -EPROBE_DEFER */
>
> and then in other code you can just do
>
>         hip04_ppe_set_foo(priv->ppe, foo_config);
>
> This is a somewhat more high-level abstraction that syscon, which
> just gives you a 'struct regmap' structure for doing register-level
> configuration like you have today.
>

Do you mean create one additional file like ppe.c with some exported
function to remove the static ppebase?
Since the ppe is specifically bounded with ethernet, and does not used
anywhere else,
the exported function may not be used anywhere else.
Is it make it more complicated since there are probe, remove etc.

So I may still prefer using "static void __iomem *ppebase", as it is simpler.

Thanks

^ permalink raw reply	[flat|nested] 148+ messages in thread

* [PATCH 3/3] net: hisilicon: new hip04 ethernet driver
@ 2014-03-24  8:17             ` Zhangfei Gao
  0 siblings, 0 replies; 148+ messages in thread
From: Zhangfei Gao @ 2014-03-24  8:17 UTC (permalink / raw)
  To: linux-arm-kernel

Dear Arnd

On Thu, Mar 20, 2014 at 10:31 PM, Arnd Bergmann <arnd@arndb.de> wrote:
> On Thursday 20 March 2014, Zhangfei Gao wrote:
>> On Tue, Mar 18, 2014 at 7:25 PM, Arnd Bergmann <arnd@arndb.de> wrote:
>> > On Tuesday 18 March 2014 16:40:17 Zhangfei Gao wrote:
>> >
>> >> +
>> >> +static void __iomem *ppebase;
>> >
>> > The global 'ppebase' seems hacky. Isn't that a SoC-specific register area, while
>> > the rest of the driver is reusable across SoCs?
>> >
>> > What does 'ppe' stand for?
>> >
>> > What if there are multiple instances of this, which each have their own ppebase?
>>
>> In this specific platform,
>> ppe is the only module controlling all the fifos for all the net controller.
>> And each controller connect to specific port.
>> ppe has 2048 channels, sharing by all the controller, only if not overlapped.
>>
>> So the static ppebase is required, which I don't like too.
>> Two inputs required, one is port, which is connect to the controller.
>> The other is start channel, currently I use id, and start channel is
>> RX_DESC_NUM * priv->id;  /* start_addr */
>
> Ok, thanks for the explanation!
>
I thought you are fine with "static void __iomem *ppebase" here.

>> >
>> >> +     if (!ppebase) {
>> >> +             struct device_node *n;
>> >> +
>> >> +             n = of_find_compatible_node(NULL, NULL, "hisilicon,hip04-ppebase");
>> >> +             if (!n) {
>> >> +                     ret = -EINVAL;
>> >> +                     netdev_err(ndev, "not find hisilicon,ppebase\n");
>> >> +                     goto init_fail;
>> >> +             }
>> >> +             ppebase = of_iomap(n, 0);
>> >> +     }
>> >
>> > How about using syscon_regmap_lookup_by_phandle() here? That way, you can have
>> > a more generic abstraction of the ppe, and stick the port and id in there as
>> > well, e.g.
>> >
>> >         ppe-syscon = <&hip04ppe 1 4>; // ppe, port, id

Even if using syscon_regmap_lookup_by_phandle, there still have static
struct regmap, since three controllers
share one regmap.

> It's probably a little simpler to avoid the sub-nodes and instead do
>
>>               ppe: ppe at 28c0000 {
>>                         compatible = "hisilicon,hip04-ppe";
>>                         reg = <0x28c0000 0x10000>;
>>                         #address-cells = <1>;
>>                         #size-cells = <0>;
>>                 };
>>                 fe: ethernet at 28b0000 {
>>                         compatible = "hisilicon,hip04-mac";
>>                         reg = <0x28b0000 0x10000>;
>>                         interrupts = <0 413 4>;
>>                         phy-mode = "mii";
>>                         port-handle = <&ppe 31>;
>>                 };
>
> In the code, I would create a simple ppe driver that exports
> a few functions. you need. In the ethernet driver probe() function,
> you should get a handle to the ppe using
>
>         /* look up "port-handle" property of the current device, find ppe and port */
>         struct hip04_ppe *ppe = hip04_ppe_get(dev->of_node);
>         if (IS_ERR(ppe))
>                 return PTR_ERR(ptr); /* this handles -EPROBE_DEFER */
>
> and then in other code you can just do
>
>         hip04_ppe_set_foo(priv->ppe, foo_config);
>
> This is a somewhat more high-level abstraction that syscon, which
> just gives you a 'struct regmap' structure for doing register-level
> configuration like you have today.
>

Do you mean create one additional file like ppe.c with some exported
function to remove the static ppebase?
Since the ppe is specifically bounded with ethernet, and does not used
anywhere else,
the exported function may not be used anywhere else.
Is it make it more complicated since there are probe, remove etc.

So I may still prefer using "static void __iomem *ppebase", as it is simpler.

Thanks

^ permalink raw reply	[flat|nested] 148+ messages in thread

* Re: [PATCH 3/3] net: hisilicon: new hip04 ethernet driver
  2014-03-24  8:17             ` Zhangfei Gao
@ 2014-03-24 10:02               ` Arnd Bergmann
  -1 siblings, 0 replies; 148+ messages in thread
From: Arnd Bergmann @ 2014-03-24 10:02 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: Zhangfei Gao, Zhangfei Gao, devicetree, David S. Miller, netdev

On Monday 24 March 2014 16:17:42 Zhangfei Gao wrote:
> On Thu, Mar 20, 2014 at 10:31 PM, Arnd Bergmann <arnd@arndb.de> wrote:
> >> >
> >> >> +     if (!ppebase) {
> >> >> +             struct device_node *n;
> >> >> +
> >> >> +             n = of_find_compatible_node(NULL, NULL, "hisilicon,hip04-ppebase");
> >> >> +             if (!n) {
> >> >> +                     ret = -EINVAL;
> >> >> +                     netdev_err(ndev, "not find hisilicon,ppebase\n");
> >> >> +                     goto init_fail;
> >> >> +             }
> >> >> +             ppebase = of_iomap(n, 0);
> >> >> +     }
> >> >
> >> > How about using syscon_regmap_lookup_by_phandle() here? That way, you can have
> >> > a more generic abstraction of the ppe, and stick the port and id in there as
> >> > well, e.g.
> >> >
> >> >         ppe-syscon = <&hip04ppe 1 4>; // ppe, port, id
> 
> Even if using syscon_regmap_lookup_by_phandle, there still have static
> struct regmap, since three controllers
> share one regmap.

The regmap is then owned by the syscon driver, while each controller takes
a reference to the regmap that it can store in its own private data
structure. However, as we discussed using a ppe driver sounds nicer than
regmap.

> > It's probably a little simpler to avoid the sub-nodes and instead do
> >
> >>               ppe: ppe@28c0000 {
> >>                         compatible = "hisilicon,hip04-ppe";
> >>                         reg = <0x28c0000 0x10000>;
> >>                         #address-cells = <1>;
> >>                         #size-cells = <0>;
> >>                 };
> >>                 fe: ethernet@28b0000 {
> >>                         compatible = "hisilicon,hip04-mac";
> >>                         reg = <0x28b0000 0x10000>;
> >>                         interrupts = <0 413 4>;
> >>                         phy-mode = "mii";
> >>                         port-handle = <&ppe 31>;
> >>                 };
> >
> > In the code, I would create a simple ppe driver that exports
> > a few functions. you need. In the ethernet driver probe() function,
> > you should get a handle to the ppe using
> >
> >         /* look up "port-handle" property of the current device, find ppe and port */
> >         struct hip04_ppe *ppe = hip04_ppe_get(dev->of_node);
> >         if (IS_ERR(ppe))
> >                 return PTR_ERR(ptr); /* this handles -EPROBE_DEFER */
> >
> > and then in other code you can just do
> >
> >         hip04_ppe_set_foo(priv->ppe, foo_config);
> >
> > This is a somewhat more high-level abstraction that syscon, which
> > just gives you a 'struct regmap' structure for doing register-level
> > configuration like you have today.
> >
> 
> Do you mean create one additional file like ppe.c with some exported
> function to remove the static ppebase?

It doesn't have to be a separate file, as long as you register a
separate platform_driver for the ppe.

> Since the ppe is specifically bounded with ethernet, and does not used
> anywhere else,
> the exported function may not be used anywhere else.
> Is it make it more complicated since there are probe, remove etc.
> 
> So I may still prefer using "static void __iomem *ppebase", as it is simpler.

The trouble is that the driver should not rely on being only there
for a single instance, that's not how we write drivers.

I'm fine with either a syscon instance (which would be simpler) or a
separate ppe driver as part of the hip04-mac driver (which would be
a nicer abstraction).

	Arnd

^ permalink raw reply	[flat|nested] 148+ messages in thread

* [PATCH 3/3] net: hisilicon: new hip04 ethernet driver
@ 2014-03-24 10:02               ` Arnd Bergmann
  0 siblings, 0 replies; 148+ messages in thread
From: Arnd Bergmann @ 2014-03-24 10:02 UTC (permalink / raw)
  To: linux-arm-kernel

On Monday 24 March 2014 16:17:42 Zhangfei Gao wrote:
> On Thu, Mar 20, 2014 at 10:31 PM, Arnd Bergmann <arnd@arndb.de> wrote:
> >> >
> >> >> +     if (!ppebase) {
> >> >> +             struct device_node *n;
> >> >> +
> >> >> +             n = of_find_compatible_node(NULL, NULL, "hisilicon,hip04-ppebase");
> >> >> +             if (!n) {
> >> >> +                     ret = -EINVAL;
> >> >> +                     netdev_err(ndev, "not find hisilicon,ppebase\n");
> >> >> +                     goto init_fail;
> >> >> +             }
> >> >> +             ppebase = of_iomap(n, 0);
> >> >> +     }
> >> >
> >> > How about using syscon_regmap_lookup_by_phandle() here? That way, you can have
> >> > a more generic abstraction of the ppe, and stick the port and id in there as
> >> > well, e.g.
> >> >
> >> >         ppe-syscon = <&hip04ppe 1 4>; // ppe, port, id
> 
> Even if using syscon_regmap_lookup_by_phandle, there still have static
> struct regmap, since three controllers
> share one regmap.

The regmap is then owned by the syscon driver, while each controller takes
a reference to the regmap that it can store in its own private data
structure. However, as we discussed using a ppe driver sounds nicer than
regmap.

> > It's probably a little simpler to avoid the sub-nodes and instead do
> >
> >>               ppe: ppe at 28c0000 {
> >>                         compatible = "hisilicon,hip04-ppe";
> >>                         reg = <0x28c0000 0x10000>;
> >>                         #address-cells = <1>;
> >>                         #size-cells = <0>;
> >>                 };
> >>                 fe: ethernet at 28b0000 {
> >>                         compatible = "hisilicon,hip04-mac";
> >>                         reg = <0x28b0000 0x10000>;
> >>                         interrupts = <0 413 4>;
> >>                         phy-mode = "mii";
> >>                         port-handle = <&ppe 31>;
> >>                 };
> >
> > In the code, I would create a simple ppe driver that exports
> > a few functions. you need. In the ethernet driver probe() function,
> > you should get a handle to the ppe using
> >
> >         /* look up "port-handle" property of the current device, find ppe and port */
> >         struct hip04_ppe *ppe = hip04_ppe_get(dev->of_node);
> >         if (IS_ERR(ppe))
> >                 return PTR_ERR(ptr); /* this handles -EPROBE_DEFER */
> >
> > and then in other code you can just do
> >
> >         hip04_ppe_set_foo(priv->ppe, foo_config);
> >
> > This is a somewhat more high-level abstraction that syscon, which
> > just gives you a 'struct regmap' structure for doing register-level
> > configuration like you have today.
> >
> 
> Do you mean create one additional file like ppe.c with some exported
> function to remove the static ppebase?

It doesn't have to be a separate file, as long as you register a
separate platform_driver for the ppe.

> Since the ppe is specifically bounded with ethernet, and does not used
> anywhere else,
> the exported function may not be used anywhere else.
> Is it make it more complicated since there are probe, remove etc.
> 
> So I may still prefer using "static void __iomem *ppebase", as it is simpler.

The trouble is that the driver should not rely on being only there
for a single instance, that's not how we write drivers.

I'm fine with either a syscon instance (which would be simpler) or a
separate ppe driver as part of the hip04-mac driver (which would be
a nicer abstraction).

	Arnd

^ permalink raw reply	[flat|nested] 148+ messages in thread

* Re: [PATCH 3/3] net: hisilicon: new hip04 ethernet driver
  2014-03-24 10:02               ` Arnd Bergmann
@ 2014-03-24 13:23                 ` Zhangfei Gao
  -1 siblings, 0 replies; 148+ messages in thread
From: Zhangfei Gao @ 2014-03-24 13:23 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: linux-arm-kernel, Zhangfei Gao, devicetree, David S. Miller, netdev

On Mon, Mar 24, 2014 at 6:02 PM, Arnd Bergmann <arnd@arndb.de> wrote:
> On Monday 24 March 2014 16:17:42 Zhangfei Gao wrote:
>> On Thu, Mar 20, 2014 at 10:31 PM, Arnd Bergmann <arnd@arndb.de> wrote:
>> >> >
>> >> >> +     if (!ppebase) {
>> >> >> +             struct device_node *n;
>> >> >> +
>> >> >> +             n = of_find_compatible_node(NULL, NULL, "hisilicon,hip04-ppebase");
>> >> >> +             if (!n) {
>> >> >> +                     ret = -EINVAL;
>> >> >> +                     netdev_err(ndev, "not find hisilicon,ppebase\n");
>> >> >> +                     goto init_fail;
>> >> >> +             }
>> >> >> +             ppebase = of_iomap(n, 0);
>> >> >> +     }
>> >> >
>> >> > How about using syscon_regmap_lookup_by_phandle() here? That way, you can have
>> >> > a more generic abstraction of the ppe, and stick the port and id in there as
>> >> > well, e.g.
>> >> >
>> >> >         ppe-syscon = <&hip04ppe 1 4>; // ppe, port, id
>>
>> Even if using syscon_regmap_lookup_by_phandle, there still have static
>> struct regmap, since three controllers
>> share one regmap.
>
> The regmap is then owned by the syscon driver, while each controller takes
> a reference to the regmap that it can store in its own private data
> structure. However, as we discussed using a ppe driver sounds nicer than
> regmap.
>
>> > It's probably a little simpler to avoid the sub-nodes and instead do
>> >
>> >>               ppe: ppe@28c0000 {
>> >>                         compatible = "hisilicon,hip04-ppe";
>> >>                         reg = <0x28c0000 0x10000>;
>> >>                         #address-cells = <1>;
>> >>                         #size-cells = <0>;
>> >>                 };
>> >>                 fe: ethernet@28b0000 {
>> >>                         compatible = "hisilicon,hip04-mac";
>> >>                         reg = <0x28b0000 0x10000>;
>> >>                         interrupts = <0 413 4>;
>> >>                         phy-mode = "mii";
>> >>                         port-handle = <&ppe 31>;
>> >>                 };
>> >
>> > In the code, I would create a simple ppe driver that exports
>> > a few functions. you need. In the ethernet driver probe() function,
>> > you should get a handle to the ppe using
>> >
>> >         /* look up "port-handle" property of the current device, find ppe and port */
>> >         struct hip04_ppe *ppe = hip04_ppe_get(dev->of_node);
>> >         if (IS_ERR(ppe))
>> >                 return PTR_ERR(ptr); /* this handles -EPROBE_DEFER */
>> >
>> > and then in other code you can just do
>> >
>> >         hip04_ppe_set_foo(priv->ppe, foo_config);
>> >
>> > This is a somewhat more high-level abstraction that syscon, which
>> > just gives you a 'struct regmap' structure for doing register-level
>> > configuration like you have today.
>> >
>>
>> Do you mean create one additional file like ppe.c with some exported
>> function to remove the static ppebase?
>
> It doesn't have to be a separate file, as long as you register a
> separate platform_driver for the ppe.
>
>> Since the ppe is specifically bounded with ethernet, and does not used
>> anywhere else,
>> the exported function may not be used anywhere else.
>> Is it make it more complicated since there are probe, remove etc.
>>
>> So I may still prefer using "static void __iomem *ppebase", as it is simpler.
>
> The trouble is that the driver should not rely on being only there
> for a single instance, that's not how we write drivers.
>
> I'm fine with either a syscon instance (which would be simpler) or a
> separate ppe driver as part of the hip04-mac driver (which would be
> a nicer abstraction).
>

Understand now.
Will update with syscon, as it is simpler.
Thanks for your patience.

^ permalink raw reply	[flat|nested] 148+ messages in thread

* [PATCH 3/3] net: hisilicon: new hip04 ethernet driver
@ 2014-03-24 13:23                 ` Zhangfei Gao
  0 siblings, 0 replies; 148+ messages in thread
From: Zhangfei Gao @ 2014-03-24 13:23 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Mar 24, 2014 at 6:02 PM, Arnd Bergmann <arnd@arndb.de> wrote:
> On Monday 24 March 2014 16:17:42 Zhangfei Gao wrote:
>> On Thu, Mar 20, 2014 at 10:31 PM, Arnd Bergmann <arnd@arndb.de> wrote:
>> >> >
>> >> >> +     if (!ppebase) {
>> >> >> +             struct device_node *n;
>> >> >> +
>> >> >> +             n = of_find_compatible_node(NULL, NULL, "hisilicon,hip04-ppebase");
>> >> >> +             if (!n) {
>> >> >> +                     ret = -EINVAL;
>> >> >> +                     netdev_err(ndev, "not find hisilicon,ppebase\n");
>> >> >> +                     goto init_fail;
>> >> >> +             }
>> >> >> +             ppebase = of_iomap(n, 0);
>> >> >> +     }
>> >> >
>> >> > How about using syscon_regmap_lookup_by_phandle() here? That way, you can have
>> >> > a more generic abstraction of the ppe, and stick the port and id in there as
>> >> > well, e.g.
>> >> >
>> >> >         ppe-syscon = <&hip04ppe 1 4>; // ppe, port, id
>>
>> Even if using syscon_regmap_lookup_by_phandle, there still have static
>> struct regmap, since three controllers
>> share one regmap.
>
> The regmap is then owned by the syscon driver, while each controller takes
> a reference to the regmap that it can store in its own private data
> structure. However, as we discussed using a ppe driver sounds nicer than
> regmap.
>
>> > It's probably a little simpler to avoid the sub-nodes and instead do
>> >
>> >>               ppe: ppe at 28c0000 {
>> >>                         compatible = "hisilicon,hip04-ppe";
>> >>                         reg = <0x28c0000 0x10000>;
>> >>                         #address-cells = <1>;
>> >>                         #size-cells = <0>;
>> >>                 };
>> >>                 fe: ethernet at 28b0000 {
>> >>                         compatible = "hisilicon,hip04-mac";
>> >>                         reg = <0x28b0000 0x10000>;
>> >>                         interrupts = <0 413 4>;
>> >>                         phy-mode = "mii";
>> >>                         port-handle = <&ppe 31>;
>> >>                 };
>> >
>> > In the code, I would create a simple ppe driver that exports
>> > a few functions. you need. In the ethernet driver probe() function,
>> > you should get a handle to the ppe using
>> >
>> >         /* look up "port-handle" property of the current device, find ppe and port */
>> >         struct hip04_ppe *ppe = hip04_ppe_get(dev->of_node);
>> >         if (IS_ERR(ppe))
>> >                 return PTR_ERR(ptr); /* this handles -EPROBE_DEFER */
>> >
>> > and then in other code you can just do
>> >
>> >         hip04_ppe_set_foo(priv->ppe, foo_config);
>> >
>> > This is a somewhat more high-level abstraction that syscon, which
>> > just gives you a 'struct regmap' structure for doing register-level
>> > configuration like you have today.
>> >
>>
>> Do you mean create one additional file like ppe.c with some exported
>> function to remove the static ppebase?
>
> It doesn't have to be a separate file, as long as you register a
> separate platform_driver for the ppe.
>
>> Since the ppe is specifically bounded with ethernet, and does not used
>> anywhere else,
>> the exported function may not be used anywhere else.
>> Is it make it more complicated since there are probe, remove etc.
>>
>> So I may still prefer using "static void __iomem *ppebase", as it is simpler.
>
> The trouble is that the driver should not rely on being only there
> for a single instance, that's not how we write drivers.
>
> I'm fine with either a syscon instance (which would be simpler) or a
> separate ppe driver as part of the hip04-mac driver (which would be
> a nicer abstraction).
>

Understand now.
Will update with syscon, as it is simpler.
Thanks for your patience.

^ permalink raw reply	[flat|nested] 148+ messages in thread

* Re: [PATCH 3/3] net: hisilicon: new hip04 ethernet driver
  2014-03-20  9:51           ` Zhangfei Gao
@ 2014-03-24 14:17             ` Rob Herring
  -1 siblings, 0 replies; 148+ messages in thread
From: Rob Herring @ 2014-03-24 14:17 UTC (permalink / raw)
  To: Zhangfei Gao
  Cc: Russell King - ARM Linux, Zhangfei Gao, devicetree,
	David S. Miller, linux-arm-kernel, netdev

On Thu, Mar 20, 2014 at 4:51 AM, Zhangfei Gao <zhangfei.gao@gmail.com> wrote:
> Dear Russell
>
> Thanks for sparing time and giving so many perfect suggestion, really helpful.
>
> On Tue, Mar 18, 2014 at 6:46 PM, Russell King - ARM Linux
> <linux@arm.linux.org.uk> wrote:
>> I was just browsing this patch when I noticed some of these issues - I
>> haven't done a full review of this driver, I'm just commenting on the
>> things I've spotted.

[snip]

>>> +             dma_map_single(&ndev->dev, skb->data,
>>> +                     RX_BUF_SIZE, DMA_FROM_DEVICE);
>>
>> This is incorrect.
>>
>> buf = buffer alloc()
>> /* CPU owns buffer and can read/write it, device does not */
>> dev_addr = dma_map_single(dev, buf, ..., DMA_FROM_DEVICE);
>> /* Device owns buffer and can write it, CPU does not access it */
>> dma_unmap_single(dev, dev_addr, ..., DMA_FROM_DEVICE);
>> /* CPU owns buffer again and can read/write it, device does not */
>>
>> Please turn on DMA API debugging in the kernel debug options and verify
>> whether your driver causes it to complain (it will.)
>
> Yes, you are right.
> After change to dma_map/unmap_single, however, still get warning like
> "DMA-API: device driver failed to check map error", not sure whether
> it can be ignored?

If it could be ignored, there would be no warning. So yes you should
check the error. I guess correct error handling would be throwing away
the packet.


>>> +             dma_map_single(&ndev->dev, buf, RX_BUF_SIZE, DMA_TO_DEVICE);
>>> +             hip04_set_recv_desc(priv, virt_to_phys(buf));
>>
>> No need for virt_to_phys() here - dma_map_single() returns the device
>> address.
> Got it.
> Use virt_to_phys since find same result come out, it should be
> different for iommu case.
>
> In fact, the hardware can help to do the cache flushing, the function
> still not be enabled now.
> Then dma_map/unmap_single may be ignored.

If you don't need cache flushing, you should setup different
dma_map_ops for the device such as arm_coherent_dma_ops. The driver
should always have the dma_map calls. See highbank and mvebu for
examples.

Rob

^ permalink raw reply	[flat|nested] 148+ messages in thread

* [PATCH 3/3] net: hisilicon: new hip04 ethernet driver
@ 2014-03-24 14:17             ` Rob Herring
  0 siblings, 0 replies; 148+ messages in thread
From: Rob Herring @ 2014-03-24 14:17 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Mar 20, 2014 at 4:51 AM, Zhangfei Gao <zhangfei.gao@gmail.com> wrote:
> Dear Russell
>
> Thanks for sparing time and giving so many perfect suggestion, really helpful.
>
> On Tue, Mar 18, 2014 at 6:46 PM, Russell King - ARM Linux
> <linux@arm.linux.org.uk> wrote:
>> I was just browsing this patch when I noticed some of these issues - I
>> haven't done a full review of this driver, I'm just commenting on the
>> things I've spotted.

[snip]

>>> +             dma_map_single(&ndev->dev, skb->data,
>>> +                     RX_BUF_SIZE, DMA_FROM_DEVICE);
>>
>> This is incorrect.
>>
>> buf = buffer alloc()
>> /* CPU owns buffer and can read/write it, device does not */
>> dev_addr = dma_map_single(dev, buf, ..., DMA_FROM_DEVICE);
>> /* Device owns buffer and can write it, CPU does not access it */
>> dma_unmap_single(dev, dev_addr, ..., DMA_FROM_DEVICE);
>> /* CPU owns buffer again and can read/write it, device does not */
>>
>> Please turn on DMA API debugging in the kernel debug options and verify
>> whether your driver causes it to complain (it will.)
>
> Yes, you are right.
> After change to dma_map/unmap_single, however, still get warning like
> "DMA-API: device driver failed to check map error", not sure whether
> it can be ignored?

If it could be ignored, there would be no warning. So yes you should
check the error. I guess correct error handling would be throwing away
the packet.


>>> +             dma_map_single(&ndev->dev, buf, RX_BUF_SIZE, DMA_TO_DEVICE);
>>> +             hip04_set_recv_desc(priv, virt_to_phys(buf));
>>
>> No need for virt_to_phys() here - dma_map_single() returns the device
>> address.
> Got it.
> Use virt_to_phys since find same result come out, it should be
> different for iommu case.
>
> In fact, the hardware can help to do the cache flushing, the function
> still not be enabled now.
> Then dma_map/unmap_single may be ignored.

If you don't need cache flushing, you should setup different
dma_map_ops for the device such as arm_coherent_dma_ops. The driver
should always have the dma_map calls. See highbank and mvebu for
examples.

Rob

^ permalink raw reply	[flat|nested] 148+ messages in thread

* Re: [PATCH 3/3] net: hisilicon: new hip04 ethernet driver
  2014-03-24 14:17             ` Rob Herring
@ 2014-03-26 14:22               ` Zhangfei Gao
  -1 siblings, 0 replies; 148+ messages in thread
From: Zhangfei Gao @ 2014-03-26 14:22 UTC (permalink / raw)
  To: Rob Herring
  Cc: Russell King - ARM Linux, Zhangfei Gao, devicetree,
	David S. Miller, linux-arm-kernel, netdev

Dear Rob

On Mon, Mar 24, 2014 at 10:17 PM, Rob Herring <robherring2@gmail.com> wrote:

>
>>>> +             dma_map_single(&ndev->dev, skb->data,
>>>> +                     RX_BUF_SIZE, DMA_FROM_DEVICE);
>>>
>>> This is incorrect.
>>>
>>> buf = buffer alloc()
>>> /* CPU owns buffer and can read/write it, device does not */
>>> dev_addr = dma_map_single(dev, buf, ..., DMA_FROM_DEVICE);
>>> /* Device owns buffer and can write it, CPU does not access it */
>>> dma_unmap_single(dev, dev_addr, ..., DMA_FROM_DEVICE);
>>> /* CPU owns buffer again and can read/write it, device does not */
>>>
>>> Please turn on DMA API debugging in the kernel debug options and verify
>>> whether your driver causes it to complain (it will.)
>>
>> Yes, you are right.
>> After change to dma_map/unmap_single, however, still get warning like
>> "DMA-API: device driver failed to check map error", not sure whether
>> it can be ignored?
>
> If it could be ignored, there would be no warning. So yes you should
> check the error. I guess correct error handling would be throwing away
> the packet.

The warning is solved via adding dma_mapping_error check after every
dma_map_single.

>
>
>>>> +             dma_map_single(&ndev->dev, buf, RX_BUF_SIZE, DMA_TO_DEVICE);
>>>> +             hip04_set_recv_desc(priv, virt_to_phys(buf));
>>>
>>> No need for virt_to_phys() here - dma_map_single() returns the device
>>> address.
>> Got it.
>> Use virt_to_phys since find same result come out, it should be
>> different for iommu case.
>>
>> In fact, the hardware can help to do the cache flushing, the function
>> still not be enabled now.
>> Then dma_map/unmap_single may be ignored.
>
> If you don't need cache flushing, you should setup different
> dma_map_ops for the device such as arm_coherent_dma_ops. The driver
> should always have the dma_map calls. See highbank and mvebu for
> examples.

That would be very helpful, still not sure how to enable this feature.
Could you help clarify which file.
Not find highbank and mvebu under drivers/iommu/.

Thanks

^ permalink raw reply	[flat|nested] 148+ messages in thread

* [PATCH 3/3] net: hisilicon: new hip04 ethernet driver
@ 2014-03-26 14:22               ` Zhangfei Gao
  0 siblings, 0 replies; 148+ messages in thread
From: Zhangfei Gao @ 2014-03-26 14:22 UTC (permalink / raw)
  To: linux-arm-kernel

Dear Rob

On Mon, Mar 24, 2014 at 10:17 PM, Rob Herring <robherring2@gmail.com> wrote:

>
>>>> +             dma_map_single(&ndev->dev, skb->data,
>>>> +                     RX_BUF_SIZE, DMA_FROM_DEVICE);
>>>
>>> This is incorrect.
>>>
>>> buf = buffer alloc()
>>> /* CPU owns buffer and can read/write it, device does not */
>>> dev_addr = dma_map_single(dev, buf, ..., DMA_FROM_DEVICE);
>>> /* Device owns buffer and can write it, CPU does not access it */
>>> dma_unmap_single(dev, dev_addr, ..., DMA_FROM_DEVICE);
>>> /* CPU owns buffer again and can read/write it, device does not */
>>>
>>> Please turn on DMA API debugging in the kernel debug options and verify
>>> whether your driver causes it to complain (it will.)
>>
>> Yes, you are right.
>> After change to dma_map/unmap_single, however, still get warning like
>> "DMA-API: device driver failed to check map error", not sure whether
>> it can be ignored?
>
> If it could be ignored, there would be no warning. So yes you should
> check the error. I guess correct error handling would be throwing away
> the packet.

The warning is solved via adding dma_mapping_error check after every
dma_map_single.

>
>
>>>> +             dma_map_single(&ndev->dev, buf, RX_BUF_SIZE, DMA_TO_DEVICE);
>>>> +             hip04_set_recv_desc(priv, virt_to_phys(buf));
>>>
>>> No need for virt_to_phys() here - dma_map_single() returns the device
>>> address.
>> Got it.
>> Use virt_to_phys since find same result come out, it should be
>> different for iommu case.
>>
>> In fact, the hardware can help to do the cache flushing, the function
>> still not be enabled now.
>> Then dma_map/unmap_single may be ignored.
>
> If you don't need cache flushing, you should setup different
> dma_map_ops for the device such as arm_coherent_dma_ops. The driver
> should always have the dma_map calls. See highbank and mvebu for
> examples.

That would be very helpful, still not sure how to enable this feature.
Could you help clarify which file.
Not find highbank and mvebu under drivers/iommu/.

Thanks

^ permalink raw reply	[flat|nested] 148+ messages in thread

* Re: [PATCH 3/3] net: hisilicon: new hip04 ethernet driver
  2014-04-07 18:53     ` David Miller
@ 2014-04-18 13:17       ` zhangfei
  -1 siblings, 0 replies; 148+ messages in thread
From: zhangfei @ 2014-04-18 13:17 UTC (permalink / raw)
  To: David Miller
  Cc: linux, arnd, f.fainelli, sergei.shtylyov, mark.rutland,
	David.Laight, eric.dumazet, linux-arm-kernel, netdev, devicetree

Dear David

On 04/08/2014 02:53 AM, David Miller wrote:

>> +static void hip04_tx_reclaim(struct net_device *ndev, bool force)
>   ...
>> +static void hip04_xmit_timer(unsigned long data)
>> +{
>> +	struct net_device *ndev = (void *) data;
>> +
>> +	hip04_tx_reclaim(ndev, false);
>> +}
>   ...
>> +	mod_timer(&priv->txtimer, jiffies + RECLAIM_PERIOD);
>
> And this is where I stop reading your driver, I've stated already that this
> kind of reclaim scheme is unacceptable.
>
> The kernel timers lack the granularity necessary to service TX reclaim
> with a reasonable amount of latency.
>
> You must use some kind of hardware notification of TX slots becomming
> available, I find it totally impossible that a modern ethernet controller
> was created without a TX done interrupt.
>

There is no tx_done interrupt, we may need some workaround.

Is it acceptable to use poll to reclaim the xmitted buffer.
And in the xmit calling napi_schedule.

Thanks

^ permalink raw reply	[flat|nested] 148+ messages in thread

* [PATCH 3/3] net: hisilicon: new hip04 ethernet driver
@ 2014-04-18 13:17       ` zhangfei
  0 siblings, 0 replies; 148+ messages in thread
From: zhangfei @ 2014-04-18 13:17 UTC (permalink / raw)
  To: linux-arm-kernel

Dear David

On 04/08/2014 02:53 AM, David Miller wrote:

>> +static void hip04_tx_reclaim(struct net_device *ndev, bool force)
>   ...
>> +static void hip04_xmit_timer(unsigned long data)
>> +{
>> +	struct net_device *ndev = (void *) data;
>> +
>> +	hip04_tx_reclaim(ndev, false);
>> +}
>   ...
>> +	mod_timer(&priv->txtimer, jiffies + RECLAIM_PERIOD);
>
> And this is where I stop reading your driver, I've stated already that this
> kind of reclaim scheme is unacceptable.
>
> The kernel timers lack the granularity necessary to service TX reclaim
> with a reasonable amount of latency.
>
> You must use some kind of hardware notification of TX slots becomming
> available, I find it totally impossible that a modern ethernet controller
> was created without a TX done interrupt.
>

There is no tx_done interrupt, we may need some workaround.

Is it acceptable to use poll to reclaim the xmitted buffer.
And in the xmit calling napi_schedule.

Thanks

^ permalink raw reply	[flat|nested] 148+ messages in thread

* Re: [PATCH 3/3] net: hisilicon: new hip04 ethernet driver
  2014-04-08  8:30         ` David Laight
@ 2014-04-08 14:47             ` zhangfei
  -1 siblings, 0 replies; 148+ messages in thread
From: zhangfei @ 2014-04-08 14:47 UTC (permalink / raw)
  To: David Laight, David Miller
  Cc: linux-lFZ/pmaqli7XmaaqVzeoHQ, arnd-r2nGTMty4D4,
	f.fainelli-Re5JQEeQqe8AvxtiuMwx3w,
	sergei.shtylyov-M4DtvfQ/ZS1MRgGoP+s0PdBPR1lH4CV8,
	mark.rutland-5wv7dgnIgG8, eric.dumazet-Re5JQEeQqe8AvxtiuMwx3w,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	netdev-u79uwXL29TY76Z2rM5mHXA, devicetree-u79uwXL29TY76Z2rM5mHXA

Dear David,

On 04/08/2014 04:30 PM, David Laight wrote:
> From: zhangfei [mailto:zhangfei.gao-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org]
>> On 04/08/2014 02:53 AM, David Miller wrote:
>>> From: Zhangfei Gao <zhangfei.gao-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org>
>>> Date: Sat,  5 Apr 2014 12:35:06 +0800
>>>
>>>> +struct tx_desc {
>>>> +	u32 send_addr;
>>>> +	u16 reserved_16;
>>>> +	u16 send_size;
>
> The above doesn't look right for endianness independence.
> I'd guess the hardware spec shows a 32bit word with the 'send size'
> in one half - that is what you need to define.
>
> Since this is a tx descriptor (and written by the host) you
> can't have 'reserved' field - the host has to write it.
> probably these are 'must be zero' fields.

Yes, it is not endianness independence.
In fact, we have switched the layout since it is u16 for doing the 
switch endianness.

The reserved_16 is the part not used.
So it is simpler to define u32 here.
If upper 16 bits also need to be set, usually we still use the u32, and 
organize dynamically, right?

>
>>>> +	u32 reserved_32;
>>>> +	u32 cfg;
>>>> +	u32 wb_addr;
>>>> +} ____cacheline_aligned;
>>>
>>> I do not think that ____cacheline_aligned is appropriate at all here.
>>>
>>> First of all, this is a hardware descriptor, so it has a fixed layout
>>> and therefore size.
>
> The structure also isn't even a multiple of a power of two.
> So there will be implicit padding at the end.
>
> Since there isn't a 'pointer to next' I presume the hardware accesses
> the descriptors from adjacent physical addresses.
> So you need to explicitly pad to that size.
> If the cache line size were 128 byte the above wouldn't work at all.
Yes, __aligned(64) can be used here, when I though directly use 64 is 
not good.
The requirement is desc address should be align to 0x40, since desc phys 
is send to register whose [31:6] is used.

>
>>>
>>> Secondly, unless you declare this object statically in the data section
>>> of the object file, the alignment doesn't matter.  These descriptors
>>> are always dynamically allocated, rather than instantiated in the
>>> kernel/driver image.
>>
>> The ____cacheline_aligned used here is only for the requirement of
>> alignment, and use dma_alloc_coherent, while at first dma_pool is used
>> for the requirement of alignment.
>> Otherwise desc[1] is not aligned and can not be used directly, the
>> structure is smaller.
>
> It sounds like you should be explicitly padding the structure
> to 32 bytes - whether or not that is the cache line size.
Got it, understand now.

>
> ...
>> I am sorry, but unfortunately this series really does NOT have TX done
>> interrupt after checked with hardware guy many times.
>> And next series will add TX done interrupt according to the feedback.
>>
>> There are two reasons of removing the TX done interrupt when the chip is
>> designed.
>> 1. The specific product does not care the latency, only care the throughput.
>> 2. When doing many experiment, the tx done interrupt will impact the
>> throughput, as a result reclaim is moved to xmit as one of
>> optimizations, then finally tx done interrupt is removed at all.
>>
>> Is it acceptable of removing timer as well as latency handling, or any
>> other work around of this kind of hardware?
>
> If you don't have a global 'TX done' interrupt, you need a per
> descriptor one.
> Otherwise you cannot send at maximum rate in the absence of
> receive traffic.
>

Global 'TX done' interrupt means interrupt for desc chain (several desc 
link together), right?
There is no interrupt for either desc chain or single desc.

By the way, if single desc interrupt, is it can be optimized like napi, 
disable the interrupt and re-enable the interrupt until all buffers are 
reclaimed?

Thanks
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 148+ messages in thread

* [PATCH 3/3] net: hisilicon: new hip04 ethernet driver
@ 2014-04-08 14:47             ` zhangfei
  0 siblings, 0 replies; 148+ messages in thread
From: zhangfei @ 2014-04-08 14:47 UTC (permalink / raw)
  To: linux-arm-kernel

Dear David,

On 04/08/2014 04:30 PM, David Laight wrote:
> From: zhangfei [mailto:zhangfei.gao at linaro.org]
>> On 04/08/2014 02:53 AM, David Miller wrote:
>>> From: Zhangfei Gao <zhangfei.gao@linaro.org>
>>> Date: Sat,  5 Apr 2014 12:35:06 +0800
>>>
>>>> +struct tx_desc {
>>>> +	u32 send_addr;
>>>> +	u16 reserved_16;
>>>> +	u16 send_size;
>
> The above doesn't look right for endianness independence.
> I'd guess the hardware spec shows a 32bit word with the 'send size'
> in one half - that is what you need to define.
>
> Since this is a tx descriptor (and written by the host) you
> can't have 'reserved' field - the host has to write it.
> probably these are 'must be zero' fields.

Yes, it is not endianness independence.
In fact, we have switched the layout since it is u16 for doing the 
switch endianness.

The reserved_16 is the part not used.
So it is simpler to define u32 here.
If upper 16 bits also need to be set, usually we still use the u32, and 
organize dynamically, right?

>
>>>> +	u32 reserved_32;
>>>> +	u32 cfg;
>>>> +	u32 wb_addr;
>>>> +} ____cacheline_aligned;
>>>
>>> I do not think that ____cacheline_aligned is appropriate at all here.
>>>
>>> First of all, this is a hardware descriptor, so it has a fixed layout
>>> and therefore size.
>
> The structure also isn't even a multiple of a power of two.
> So there will be implicit padding at the end.
>
> Since there isn't a 'pointer to next' I presume the hardware accesses
> the descriptors from adjacent physical addresses.
> So you need to explicitly pad to that size.
> If the cache line size were 128 byte the above wouldn't work at all.
Yes, __aligned(64) can be used here, when I though directly use 64 is 
not good.
The requirement is desc address should be align to 0x40, since desc phys 
is send to register whose [31:6] is used.

>
>>>
>>> Secondly, unless you declare this object statically in the data section
>>> of the object file, the alignment doesn't matter.  These descriptors
>>> are always dynamically allocated, rather than instantiated in the
>>> kernel/driver image.
>>
>> The ____cacheline_aligned used here is only for the requirement of
>> alignment, and use dma_alloc_coherent, while at first dma_pool is used
>> for the requirement of alignment.
>> Otherwise desc[1] is not aligned and can not be used directly, the
>> structure is smaller.
>
> It sounds like you should be explicitly padding the structure
> to 32 bytes - whether or not that is the cache line size.
Got it, understand now.

>
> ...
>> I am sorry, but unfortunately this series really does NOT have TX done
>> interrupt after checked with hardware guy many times.
>> And next series will add TX done interrupt according to the feedback.
>>
>> There are two reasons of removing the TX done interrupt when the chip is
>> designed.
>> 1. The specific product does not care the latency, only care the throughput.
>> 2. When doing many experiment, the tx done interrupt will impact the
>> throughput, as a result reclaim is moved to xmit as one of
>> optimizations, then finally tx done interrupt is removed at all.
>>
>> Is it acceptable of removing timer as well as latency handling, or any
>> other work around of this kind of hardware?
>
> If you don't have a global 'TX done' interrupt, you need a per
> descriptor one.
> Otherwise you cannot send at maximum rate in the absence of
> receive traffic.
>

Global 'TX done' interrupt means interrupt for desc chain (several desc 
link together), right?
There is no interrupt for either desc chain or single desc.

By the way, if single desc interrupt, is it can be optimized like napi, 
disable the interrupt and re-enable the interrupt until all buffers are 
reclaimed?

Thanks

^ permalink raw reply	[flat|nested] 148+ messages in thread

* Re: [PATCH 3/3] net: hisilicon: new hip04 ethernet driver
  2014-04-08  8:30         ` David Laight
@ 2014-04-08  9:42             ` Arnd Bergmann
  -1 siblings, 0 replies; 148+ messages in thread
From: Arnd Bergmann @ 2014-04-08  9:42 UTC (permalink / raw)
  To: David Laight
  Cc: 'zhangfei',
	David Miller, linux-lFZ/pmaqli7XmaaqVzeoHQ,
	f.fainelli-Re5JQEeQqe8AvxtiuMwx3w,
	sergei.shtylyov-M4DtvfQ/ZS1MRgGoP+s0PdBPR1lH4CV8,
	mark.rutland-5wv7dgnIgG8, eric.dumazet-Re5JQEeQqe8AvxtiuMwx3w,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	netdev-u79uwXL29TY76Z2rM5mHXA, devicetree-u79uwXL29TY76Z2rM5mHXA

On Tuesday 08 April 2014 08:30:37 David Laight wrote:
> From: zhangfei [mailto:zhangfei.gao-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org]
> > On 04/08/2014 02:53 AM, David Miller wrote:
> > > From: Zhangfei Gao <zhangfei.gao-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org>
> > > Date: Sat,  5 Apr 2014 12:35:06 +0800
> > >
> > >> +struct tx_desc {
> > >> +  u32 send_addr;
> > >> +  u16 reserved_16;
> > >> +  u16 send_size;
> 
> The above doesn't look right for endianness independence.
> I'd guess the hardware spec shows a 32bit word with the 'send size'
> in one half - that is what you need to define.

It's probably good to use __be32 as the type (or possibly __be16,
depending what the reserved field actually is), to annotate the
fact that the device reads these as big-endian values from memory,
regardless of the CPU endianess.

> > >> +  u32 reserved_32;
> > >> +  u32 cfg;
> > >> +  u32 wb_addr;
> > >> +} ____cacheline_aligned;
> > >
> > > I do not think that ____cacheline_aligned is appropriate at all here.
> > >
> > > First of all, this is a hardware descriptor, so it has a fixed layout
> > > and therefore size.
> 
> The structure also isn't even a multiple of a power of two.
> So there will be implicit padding at the end.
> 
> Since there isn't a 'pointer to next' I presume the hardware accesses
> the descriptors from adjacent physical addresses.

My understanding is that in this device, the "ppe" gets a pointer
to the bus address of the descriptor through an MMIO write, and
has a hardware FIFO to keep track of the ones it still needs to
manage. The PPE is shared across multiple ethernet devices.

On a related note, it would be good if Zhangfei could add a comment
in the code to explain how the ppe avoids a FIFO overrun, because
it's not clear from the driver source.

> So you need to explicitly pad to that size.
> If the cache line size were 128 byte the above wouldn't work at all.

I originally recommended doing this, as a simplification from the
prior code that was using dma_pool_create with the cache line size
as the required alignment. While the device is used only in ARM SoCs
and ARM has a fixed cache line size of 32 bytes, you are obviously
correct that this is not something we want to rely on in a driver,
and it should use either explicit padding, or
__attribute__((__aligned__(32))) to enforce the implicit padding.

> > I am sorry, but unfortunately this series really does NOT have TX done
> > interrupt after checked with hardware guy many times.
> > And next series will add TX done interrupt according to the feedback.
> > 
> > There are two reasons of removing the TX done interrupt when the chip is
> > designed.
> > 1. The specific product does not care the latency, only care the throughput.
> > 2. When doing many experiment, the tx done interrupt will impact the
> > throughput, as a result reclaim is moved to xmit as one of
> > optimizations, then finally tx done interrupt is removed at all.

Both of these statements are clearly wrong, you have to stop bringing
these up now and work on explaining to the hardware designers what
mistake they made. This cannot be a design decision, it can only be
a bug that has to be fixed before a new version of the chip is rolled
out!

Zhangfei, do not post another version of this driver until you are
sure you have understood the problem and explained it to the hardware
designers!
Please reread what I wrote to you in the past on IRC, and find me
there again if you still have questions.

> > Is it acceptable of removing timer as well as latency handling, or any
> > other work around of this kind of hardware?
> 
> If you don't have a global 'TX done' interrupt, you need a per
> descriptor one.
> Otherwise you cannot send at maximum rate in the absence of
> receive traffic.

Zhangfei has already talked to the hardware designers a few times,
and from what I understood, they have just not considered how
software is going to use this device, and they are too shy to
admit their mistake, while Zhangfei is trying to take the blame for
them by claiming that it works as designed. This is a very nice
gesture of him, but I'm afraid it is counterproductive to getting
the driver merged.

	Arnd
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 148+ messages in thread

* [PATCH 3/3] net: hisilicon: new hip04 ethernet driver
@ 2014-04-08  9:42             ` Arnd Bergmann
  0 siblings, 0 replies; 148+ messages in thread
From: Arnd Bergmann @ 2014-04-08  9:42 UTC (permalink / raw)
  To: linux-arm-kernel

On Tuesday 08 April 2014 08:30:37 David Laight wrote:
> From: zhangfei [mailto:zhangfei.gao at linaro.org]
> > On 04/08/2014 02:53 AM, David Miller wrote:
> > > From: Zhangfei Gao <zhangfei.gao@linaro.org>
> > > Date: Sat,  5 Apr 2014 12:35:06 +0800
> > >
> > >> +struct tx_desc {
> > >> +  u32 send_addr;
> > >> +  u16 reserved_16;
> > >> +  u16 send_size;
> 
> The above doesn't look right for endianness independence.
> I'd guess the hardware spec shows a 32bit word with the 'send size'
> in one half - that is what you need to define.

It's probably good to use __be32 as the type (or possibly __be16,
depending what the reserved field actually is), to annotate the
fact that the device reads these as big-endian values from memory,
regardless of the CPU endianess.

> > >> +  u32 reserved_32;
> > >> +  u32 cfg;
> > >> +  u32 wb_addr;
> > >> +} ____cacheline_aligned;
> > >
> > > I do not think that ____cacheline_aligned is appropriate at all here.
> > >
> > > First of all, this is a hardware descriptor, so it has a fixed layout
> > > and therefore size.
> 
> The structure also isn't even a multiple of a power of two.
> So there will be implicit padding at the end.
> 
> Since there isn't a 'pointer to next' I presume the hardware accesses
> the descriptors from adjacent physical addresses.

My understanding is that in this device, the "ppe" gets a pointer
to the bus address of the descriptor through an MMIO write, and
has a hardware FIFO to keep track of the ones it still needs to
manage. The PPE is shared across multiple ethernet devices.

On a related note, it would be good if Zhangfei could add a comment
in the code to explain how the ppe avoids a FIFO overrun, because
it's not clear from the driver source.

> So you need to explicitly pad to that size.
> If the cache line size were 128 byte the above wouldn't work at all.

I originally recommended doing this, as a simplification from the
prior code that was using dma_pool_create with the cache line size
as the required alignment. While the device is used only in ARM SoCs
and ARM has a fixed cache line size of 32 bytes, you are obviously
correct that this is not something we want to rely on in a driver,
and it should use either explicit padding, or
__attribute__((__aligned__(32))) to enforce the implicit padding.

> > I am sorry, but unfortunately this series really does NOT have TX done
> > interrupt after checked with hardware guy many times.
> > And next series will add TX done interrupt according to the feedback.
> > 
> > There are two reasons of removing the TX done interrupt when the chip is
> > designed.
> > 1. The specific product does not care the latency, only care the throughput.
> > 2. When doing many experiment, the tx done interrupt will impact the
> > throughput, as a result reclaim is moved to xmit as one of
> > optimizations, then finally tx done interrupt is removed at all.

Both of these statements are clearly wrong, you have to stop bringing
these up now and work on explaining to the hardware designers what
mistake they made. This cannot be a design decision, it can only be
a bug that has to be fixed before a new version of the chip is rolled
out!

Zhangfei, do not post another version of this driver until you are
sure you have understood the problem and explained it to the hardware
designers!
Please reread what I wrote to you in the past on IRC, and find me
there again if you still have questions.

> > Is it acceptable of removing timer as well as latency handling, or any
> > other work around of this kind of hardware?
> 
> If you don't have a global 'TX done' interrupt, you need a per
> descriptor one.
> Otherwise you cannot send at maximum rate in the absence of
> receive traffic.

Zhangfei has already talked to the hardware designers a few times,
and from what I understood, they have just not considered how
software is going to use this device, and they are too shy to
admit their mistake, while Zhangfei is trying to take the blame for
them by claiming that it works as designed. This is a very nice
gesture of him, but I'm afraid it is counterproductive to getting
the driver merged.

	Arnd

^ permalink raw reply	[flat|nested] 148+ messages in thread

* RE: [PATCH 3/3] net: hisilicon: new hip04 ethernet driver
  2014-04-08  8:07       ` zhangfei
@ 2014-04-08  8:30         ` David Laight
  -1 siblings, 0 replies; 148+ messages in thread
From: David Laight @ 2014-04-08  8:30 UTC (permalink / raw)
  To: 'zhangfei', David Miller
  Cc: linux, arnd, f.fainelli, sergei.shtylyov, mark.rutland,
	eric.dumazet, linux-arm-kernel, netdev, devicetree

From: zhangfei [mailto:zhangfei.gao@linaro.org]
> On 04/08/2014 02:53 AM, David Miller wrote:
> > From: Zhangfei Gao <zhangfei.gao@linaro.org>
> > Date: Sat,  5 Apr 2014 12:35:06 +0800
> >
> >> +struct tx_desc {
> >> +	u32 send_addr;
> >> +	u16 reserved_16;
> >> +	u16 send_size;

The above doesn't look right for endianness independence.
I'd guess the hardware spec shows a 32bit word with the 'send size'
in one half - that is what you need to define.

Since this is a tx descriptor (and written by the host) you
can't have 'reserved' field - the host has to write it.
probably these are 'must be zero' fields.

> >> +	u32 reserved_32;
> >> +	u32 cfg;
> >> +	u32 wb_addr;
> >> +} ____cacheline_aligned;
> >
> > I do not think that ____cacheline_aligned is appropriate at all here.
> >
> > First of all, this is a hardware descriptor, so it has a fixed layout
> > and therefore size.

The structure also isn't even a multiple of a power of two.
So there will be implicit padding at the end.

Since there isn't a 'pointer to next' I presume the hardware accesses
the descriptors from adjacent physical addresses.
So you need to explicitly pad to that size.
If the cache line size were 128 byte the above wouldn't work at all.

> >
> > Secondly, unless you declare this object statically in the data section
> > of the object file, the alignment doesn't matter.  These descriptors
> > are always dynamically allocated, rather than instantiated in the
> > kernel/driver image.
> 
> The ____cacheline_aligned used here is only for the requirement of
> alignment, and use dma_alloc_coherent, while at first dma_pool is used
> for the requirement of alignment.
> Otherwise desc[1] is not aligned and can not be used directly, the
> structure is smaller.

It sounds like you should be explicitly padding the structure
to 32 bytes - whether or not that is the cache line size.

...
> I am sorry, but unfortunately this series really does NOT have TX done
> interrupt after checked with hardware guy many times.
> And next series will add TX done interrupt according to the feedback.
> 
> There are two reasons of removing the TX done interrupt when the chip is
> designed.
> 1. The specific product does not care the latency, only care the throughput.
> 2. When doing many experiment, the tx done interrupt will impact the
> throughput, as a result reclaim is moved to xmit as one of
> optimizations, then finally tx done interrupt is removed at all.
> 
> Is it acceptable of removing timer as well as latency handling, or any
> other work around of this kind of hardware?

If you don't have a global 'TX done' interrupt, you need a per
descriptor one.
Otherwise you cannot send at maximum rate in the absence of
receive traffic.

	David

^ permalink raw reply	[flat|nested] 148+ messages in thread

* [PATCH 3/3] net: hisilicon: new hip04 ethernet driver
@ 2014-04-08  8:30         ` David Laight
  0 siblings, 0 replies; 148+ messages in thread
From: David Laight @ 2014-04-08  8:30 UTC (permalink / raw)
  To: linux-arm-kernel

From: zhangfei [mailto:zhangfei.gao at linaro.org]
> On 04/08/2014 02:53 AM, David Miller wrote:
> > From: Zhangfei Gao <zhangfei.gao@linaro.org>
> > Date: Sat,  5 Apr 2014 12:35:06 +0800
> >
> >> +struct tx_desc {
> >> +	u32 send_addr;
> >> +	u16 reserved_16;
> >> +	u16 send_size;

The above doesn't look right for endianness independence.
I'd guess the hardware spec shows a 32bit word with the 'send size'
in one half - that is what you need to define.

Since this is a tx descriptor (and written by the host) you
can't have 'reserved' field - the host has to write it.
probably these are 'must be zero' fields.

> >> +	u32 reserved_32;
> >> +	u32 cfg;
> >> +	u32 wb_addr;
> >> +} ____cacheline_aligned;
> >
> > I do not think that ____cacheline_aligned is appropriate at all here.
> >
> > First of all, this is a hardware descriptor, so it has a fixed layout
> > and therefore size.

The structure also isn't even a multiple of a power of two.
So there will be implicit padding at the end.

Since there isn't a 'pointer to next' I presume the hardware accesses
the descriptors from adjacent physical addresses.
So you need to explicitly pad to that size.
If the cache line size were 128 byte the above wouldn't work at all.

> >
> > Secondly, unless you declare this object statically in the data section
> > of the object file, the alignment doesn't matter.  These descriptors
> > are always dynamically allocated, rather than instantiated in the
> > kernel/driver image.
> 
> The ____cacheline_aligned used here is only for the requirement of
> alignment, and use dma_alloc_coherent, while at first dma_pool is used
> for the requirement of alignment.
> Otherwise desc[1] is not aligned and can not be used directly, the
> structure is smaller.

It sounds like you should be explicitly padding the structure
to 32 bytes - whether or not that is the cache line size.

...
> I am sorry, but unfortunately this series really does NOT have TX done
> interrupt after checked with hardware guy many times.
> And next series will add TX done interrupt according to the feedback.
> 
> There are two reasons of removing the TX done interrupt when the chip is
> designed.
> 1. The specific product does not care the latency, only care the throughput.
> 2. When doing many experiment, the tx done interrupt will impact the
> throughput, as a result reclaim is moved to xmit as one of
> optimizations, then finally tx done interrupt is removed at all.
> 
> Is it acceptable of removing timer as well as latency handling, or any
> other work around of this kind of hardware?

If you don't have a global 'TX done' interrupt, you need a per
descriptor one.
Otherwise you cannot send at maximum rate in the absence of
receive traffic.

	David

^ permalink raw reply	[flat|nested] 148+ messages in thread

* Re: [PATCH 3/3] net: hisilicon: new hip04 ethernet driver
  2014-04-07 18:53     ` David Miller
@ 2014-04-08  8:07       ` zhangfei
  -1 siblings, 0 replies; 148+ messages in thread
From: zhangfei @ 2014-04-08  8:07 UTC (permalink / raw)
  To: David Miller
  Cc: linux, arnd, f.fainelli, sergei.shtylyov, mark.rutland,
	David.Laight, eric.dumazet, linux-arm-kernel, netdev, devicetree

Dear David

On 04/08/2014 02:53 AM, David Miller wrote:
> From: Zhangfei Gao <zhangfei.gao@linaro.org>
> Date: Sat,  5 Apr 2014 12:35:06 +0800
>
>> +struct tx_desc {
>> +	u32 send_addr;
>> +	u16 reserved_16;
>> +	u16 send_size;
>> +	u32 reserved_32;
>> +	u32 cfg;
>> +	u32 wb_addr;
>> +} ____cacheline_aligned;
>
> I do not think that ____cacheline_aligned is appropriate at all here.
>
> First of all, this is a hardware descriptor, so it has a fixed layout
> and therefore size.
>
> Secondly, unless you declare this object statically in the data section
> of the object file, the alignment doesn't matter.  These descriptors
> are always dynamically allocated, rather than instantiated in the
> kernel/driver image.

The ____cacheline_aligned used here is only for the requirement of 
alignment, and use dma_alloc_coherent, while at first dma_pool is used 
for the requirement of alignment.
Otherwise desc[1] is not aligned and can not be used directly, the 
structure is smaller.

>
>> +	val = (duplex) ? BIT(0) : 0;
>
> Parenthesis around duplex is not necessary, please remove.
OK
>
>> +static void hip04_reset_ppe(struct hip04_priv *priv)
>> +{
>> +	u32 val, tmp;
>> +
>> +	do {
>> +		regmap_read(priv->map, priv->port * 4 + PPE_CURR_BUF_CNT, &val);
>> +		regmap_read(priv->map, priv->port * 4 + PPE_CFG_RX_ADDR, &tmp);
>> +	} while (val & 0xfff);
>> +}
>
> This polling loop can loop forever, if the condition never triggers it will
> loop forever.  You must add some kind of limit or timeout, and subsequent
> error handing up the call chain to handle this.
OK
>
>> +	val = readl_relaxed(priv->base + PPE_CFG_STS_MODE);
>> +	val |= BIT(12);			/* PPE_HIS_RX_PKT_CNT read clear */
>> +	writel_relaxed(val, priv->base + PPE_CFG_STS_MODE);
>   ...
>> +	/* set bus ctrl */
>> +	val = BIT(14);			/* buffer locally release */
>> +	val |= BIT(0);			/* big endian */
>> +	writel_relaxed(val, priv->base + PPE_CFG_BUS_CTRL_REG);
>
> Instead of having to set only one bit at a time in every register and
> adding comments here, just define these register values using macros
> properly in a header file or similar.
>
> Then you can go val |= PPE_CFG_BUS_CTRL_VAL_THIS | PPE_CFG_BUS_CTRL_VAL_THAT
> on one line.
>
> Document the registers where you define the macros, that way people can learn
> what other bits are in these register and what they mean, even if you don't
> currently use them in the driver itself.
OK, got it.
However, some bits is not planed to open since used internally and may 
be removed later.

>
>> +static void hip04_tx_reclaim(struct net_device *ndev, bool force)
>   ...
>> +static void hip04_xmit_timer(unsigned long data)
>> +{
>> +	struct net_device *ndev = (void *) data;
>> +
>> +	hip04_tx_reclaim(ndev, false);
>> +}
>   ...
>> +	mod_timer(&priv->txtimer, jiffies + RECLAIM_PERIOD);
>
> And this is where I stop reading your driver, I've stated already that this
> kind of reclaim scheme is unacceptable.
>
> The kernel timers lack the granularity necessary to service TX reclaim
> with a reasonable amount of latency.
>
> You must use some kind of hardware notification of TX slots becomming
> available, I find it totally impossible that a modern ethernet controller
> was created without a TX done interrupt.
>

I am sorry, but unfortunately this series really does NOT have TX done 
interrupt after checked with hardware guy many times.
And next series will add TX done interrupt according to the feedback.

There are two reasons of removing the TX done interrupt when the chip is 
designed.
1. The specific product does not care the latency, only care the throughput.
2. When doing many experiment, the tx done interrupt will impact the 
throughput, as a result reclaim is moved to xmit as one of 
optimizations, then finally tx done interrupt is removed at all.

Is it acceptable of removing timer as well as latency handling, or any 
other work around of this kind of hardware?

Thanks

^ permalink raw reply	[flat|nested] 148+ messages in thread

* [PATCH 3/3] net: hisilicon: new hip04 ethernet driver
@ 2014-04-08  8:07       ` zhangfei
  0 siblings, 0 replies; 148+ messages in thread
From: zhangfei @ 2014-04-08  8:07 UTC (permalink / raw)
  To: linux-arm-kernel

Dear David

On 04/08/2014 02:53 AM, David Miller wrote:
> From: Zhangfei Gao <zhangfei.gao@linaro.org>
> Date: Sat,  5 Apr 2014 12:35:06 +0800
>
>> +struct tx_desc {
>> +	u32 send_addr;
>> +	u16 reserved_16;
>> +	u16 send_size;
>> +	u32 reserved_32;
>> +	u32 cfg;
>> +	u32 wb_addr;
>> +} ____cacheline_aligned;
>
> I do not think that ____cacheline_aligned is appropriate at all here.
>
> First of all, this is a hardware descriptor, so it has a fixed layout
> and therefore size.
>
> Secondly, unless you declare this object statically in the data section
> of the object file, the alignment doesn't matter.  These descriptors
> are always dynamically allocated, rather than instantiated in the
> kernel/driver image.

The ____cacheline_aligned used here is only for the requirement of 
alignment, and use dma_alloc_coherent, while at first dma_pool is used 
for the requirement of alignment.
Otherwise desc[1] is not aligned and can not be used directly, the 
structure is smaller.

>
>> +	val = (duplex) ? BIT(0) : 0;
>
> Parenthesis around duplex is not necessary, please remove.
OK
>
>> +static void hip04_reset_ppe(struct hip04_priv *priv)
>> +{
>> +	u32 val, tmp;
>> +
>> +	do {
>> +		regmap_read(priv->map, priv->port * 4 + PPE_CURR_BUF_CNT, &val);
>> +		regmap_read(priv->map, priv->port * 4 + PPE_CFG_RX_ADDR, &tmp);
>> +	} while (val & 0xfff);
>> +}
>
> This polling loop can loop forever, if the condition never triggers it will
> loop forever.  You must add some kind of limit or timeout, and subsequent
> error handing up the call chain to handle this.
OK
>
>> +	val = readl_relaxed(priv->base + PPE_CFG_STS_MODE);
>> +	val |= BIT(12);			/* PPE_HIS_RX_PKT_CNT read clear */
>> +	writel_relaxed(val, priv->base + PPE_CFG_STS_MODE);
>   ...
>> +	/* set bus ctrl */
>> +	val = BIT(14);			/* buffer locally release */
>> +	val |= BIT(0);			/* big endian */
>> +	writel_relaxed(val, priv->base + PPE_CFG_BUS_CTRL_REG);
>
> Instead of having to set only one bit at a time in every register and
> adding comments here, just define these register values using macros
> properly in a header file or similar.
>
> Then you can go val |= PPE_CFG_BUS_CTRL_VAL_THIS | PPE_CFG_BUS_CTRL_VAL_THAT
> on one line.
>
> Document the registers where you define the macros, that way people can learn
> what other bits are in these register and what they mean, even if you don't
> currently use them in the driver itself.
OK, got it.
However, some bits is not planed to open since used internally and may 
be removed later.

>
>> +static void hip04_tx_reclaim(struct net_device *ndev, bool force)
>   ...
>> +static void hip04_xmit_timer(unsigned long data)
>> +{
>> +	struct net_device *ndev = (void *) data;
>> +
>> +	hip04_tx_reclaim(ndev, false);
>> +}
>   ...
>> +	mod_timer(&priv->txtimer, jiffies + RECLAIM_PERIOD);
>
> And this is where I stop reading your driver, I've stated already that this
> kind of reclaim scheme is unacceptable.
>
> The kernel timers lack the granularity necessary to service TX reclaim
> with a reasonable amount of latency.
>
> You must use some kind of hardware notification of TX slots becomming
> available, I find it totally impossible that a modern ethernet controller
> was created without a TX done interrupt.
>

I am sorry, but unfortunately this series really does NOT have TX done 
interrupt after checked with hardware guy many times.
And next series will add TX done interrupt according to the feedback.

There are two reasons of removing the TX done interrupt when the chip is 
designed.
1. The specific product does not care the latency, only care the throughput.
2. When doing many experiment, the tx done interrupt will impact the 
throughput, as a result reclaim is moved to xmit as one of 
optimizations, then finally tx done interrupt is removed at all.

Is it acceptable of removing timer as well as latency handling, or any 
other work around of this kind of hardware?

Thanks

^ permalink raw reply	[flat|nested] 148+ messages in thread

* Re: [PATCH 3/3] net: hisilicon: new hip04 ethernet driver
  2014-04-05  4:35   ` Zhangfei Gao
@ 2014-04-07 18:56     ` David Miller
  -1 siblings, 0 replies; 148+ messages in thread
From: David Miller @ 2014-04-07 18:56 UTC (permalink / raw)
  To: zhangfei.gao
  Cc: linux, arnd, f.fainelli, sergei.shtylyov, mark.rutland,
	David.Laight, eric.dumazet, linux-arm-kernel, netdev, devicetree

From: Zhangfei Gao <zhangfei.gao@linaro.org>
Date: Sat,  5 Apr 2014 12:35:06 +0800

> +#define DESC_DEF_CFG			0x14

You absolutely cannot do this.

You must document what the bits in the TX descriptor config field
mean, all of them.

I bet there is a bit in there somewhere which tells the chip to signal
an interrupt when the packet has been sent.

But since you haven't documented the descriptor fields properly with
a full set of macro defines, we can't know what bit that is.

I really am very disappointed in the quality of this driver, and you
can expect that there will be a lot of push back and requests for
changes before this driver will be even close to being ready for
inclusion.

^ permalink raw reply	[flat|nested] 148+ messages in thread

* [PATCH 3/3] net: hisilicon: new hip04 ethernet driver
@ 2014-04-07 18:56     ` David Miller
  0 siblings, 0 replies; 148+ messages in thread
From: David Miller @ 2014-04-07 18:56 UTC (permalink / raw)
  To: linux-arm-kernel

From: Zhangfei Gao <zhangfei.gao@linaro.org>
Date: Sat,  5 Apr 2014 12:35:06 +0800

> +#define DESC_DEF_CFG			0x14

You absolutely cannot do this.

You must document what the bits in the TX descriptor config field
mean, all of them.

I bet there is a bit in there somewhere which tells the chip to signal
an interrupt when the packet has been sent.

But since you haven't documented the descriptor fields properly with
a full set of macro defines, we can't know what bit that is.

I really am very disappointed in the quality of this driver, and you
can expect that there will be a lot of push back and requests for
changes before this driver will be even close to being ready for
inclusion.

^ permalink raw reply	[flat|nested] 148+ messages in thread

* Re: [PATCH 3/3] net: hisilicon: new hip04 ethernet driver
  2014-04-05  4:35   ` Zhangfei Gao
@ 2014-04-07 18:53     ` David Miller
  -1 siblings, 0 replies; 148+ messages in thread
From: David Miller @ 2014-04-07 18:53 UTC (permalink / raw)
  To: zhangfei.gao
  Cc: linux, arnd, f.fainelli, sergei.shtylyov, mark.rutland,
	David.Laight, eric.dumazet, linux-arm-kernel, netdev, devicetree

From: Zhangfei Gao <zhangfei.gao@linaro.org>
Date: Sat,  5 Apr 2014 12:35:06 +0800

> +struct tx_desc {
> +	u32 send_addr;
> +	u16 reserved_16;
> +	u16 send_size;
> +	u32 reserved_32;
> +	u32 cfg;
> +	u32 wb_addr;
> +} ____cacheline_aligned;

I do not think that ____cacheline_aligned is appropriate at all here.

First of all, this is a hardware descriptor, so it has a fixed layout
and therefore size.

Secondly, unless you declare this object statically in the data section
of the object file, the alignment doesn't matter.  These descriptors
are always dynamically allocated, rather than instantiated in the
kernel/driver image.

> +	val = (duplex) ? BIT(0) : 0;

Parenthesis around duplex is not necessary, please remove.

> +static void hip04_reset_ppe(struct hip04_priv *priv)
> +{
> +	u32 val, tmp;
> +
> +	do {
> +		regmap_read(priv->map, priv->port * 4 + PPE_CURR_BUF_CNT, &val);
> +		regmap_read(priv->map, priv->port * 4 + PPE_CFG_RX_ADDR, &tmp);
> +	} while (val & 0xfff);
> +}

This polling loop can loop forever, if the condition never triggers it will
loop forever.  You must add some kind of limit or timeout, and subsequent
error handing up the call chain to handle this.

> +	val = readl_relaxed(priv->base + PPE_CFG_STS_MODE);
> +	val |= BIT(12);			/* PPE_HIS_RX_PKT_CNT read clear */
> +	writel_relaxed(val, priv->base + PPE_CFG_STS_MODE);
 ...
> +	/* set bus ctrl */
> +	val = BIT(14);			/* buffer locally release */
> +	val |= BIT(0);			/* big endian */
> +	writel_relaxed(val, priv->base + PPE_CFG_BUS_CTRL_REG);

Instead of having to set only one bit at a time in every register and
adding comments here, just define these register values using macros
properly in a header file or similar.

Then you can go val |= PPE_CFG_BUS_CTRL_VAL_THIS | PPE_CFG_BUS_CTRL_VAL_THAT
on one line.

Document the registers where you define the macros, that way people can learn
what other bits are in these register and what they mean, even if you don't
currently use them in the driver itself.

> +static void hip04_tx_reclaim(struct net_device *ndev, bool force)
 ...
> +static void hip04_xmit_timer(unsigned long data)
> +{
> +	struct net_device *ndev = (void *) data;
> +
> +	hip04_tx_reclaim(ndev, false);
> +}
 ...
> +	mod_timer(&priv->txtimer, jiffies + RECLAIM_PERIOD);

And this is where I stop reading your driver, I've stated already that this
kind of reclaim scheme is unacceptable.

The kernel timers lack the granularity necessary to service TX reclaim
with a reasonable amount of latency.

You must use some kind of hardware notification of TX slots becomming
available, I find it totally impossible that a modern ethernet controller
was created without a TX done interrupt.

^ permalink raw reply	[flat|nested] 148+ messages in thread

* [PATCH 3/3] net: hisilicon: new hip04 ethernet driver
@ 2014-04-07 18:53     ` David Miller
  0 siblings, 0 replies; 148+ messages in thread
From: David Miller @ 2014-04-07 18:53 UTC (permalink / raw)
  To: linux-arm-kernel

From: Zhangfei Gao <zhangfei.gao@linaro.org>
Date: Sat,  5 Apr 2014 12:35:06 +0800

> +struct tx_desc {
> +	u32 send_addr;
> +	u16 reserved_16;
> +	u16 send_size;
> +	u32 reserved_32;
> +	u32 cfg;
> +	u32 wb_addr;
> +} ____cacheline_aligned;

I do not think that ____cacheline_aligned is appropriate at all here.

First of all, this is a hardware descriptor, so it has a fixed layout
and therefore size.

Secondly, unless you declare this object statically in the data section
of the object file, the alignment doesn't matter.  These descriptors
are always dynamically allocated, rather than instantiated in the
kernel/driver image.

> +	val = (duplex) ? BIT(0) : 0;

Parenthesis around duplex is not necessary, please remove.

> +static void hip04_reset_ppe(struct hip04_priv *priv)
> +{
> +	u32 val, tmp;
> +
> +	do {
> +		regmap_read(priv->map, priv->port * 4 + PPE_CURR_BUF_CNT, &val);
> +		regmap_read(priv->map, priv->port * 4 + PPE_CFG_RX_ADDR, &tmp);
> +	} while (val & 0xfff);
> +}

This polling loop can loop forever, if the condition never triggers it will
loop forever.  You must add some kind of limit or timeout, and subsequent
error handing up the call chain to handle this.

> +	val = readl_relaxed(priv->base + PPE_CFG_STS_MODE);
> +	val |= BIT(12);			/* PPE_HIS_RX_PKT_CNT read clear */
> +	writel_relaxed(val, priv->base + PPE_CFG_STS_MODE);
 ...
> +	/* set bus ctrl */
> +	val = BIT(14);			/* buffer locally release */
> +	val |= BIT(0);			/* big endian */
> +	writel_relaxed(val, priv->base + PPE_CFG_BUS_CTRL_REG);

Instead of having to set only one bit at a time in every register and
adding comments here, just define these register values using macros
properly in a header file or similar.

Then you can go val |= PPE_CFG_BUS_CTRL_VAL_THIS | PPE_CFG_BUS_CTRL_VAL_THAT
on one line.

Document the registers where you define the macros, that way people can learn
what other bits are in these register and what they mean, even if you don't
currently use them in the driver itself.

> +static void hip04_tx_reclaim(struct net_device *ndev, bool force)
 ...
> +static void hip04_xmit_timer(unsigned long data)
> +{
> +	struct net_device *ndev = (void *) data;
> +
> +	hip04_tx_reclaim(ndev, false);
> +}
 ...
> +	mod_timer(&priv->txtimer, jiffies + RECLAIM_PERIOD);

And this is where I stop reading your driver, I've stated already that this
kind of reclaim scheme is unacceptable.

The kernel timers lack the granularity necessary to service TX reclaim
with a reasonable amount of latency.

You must use some kind of hardware notification of TX slots becomming
available, I find it totally impossible that a modern ethernet controller
was created without a TX done interrupt.

^ permalink raw reply	[flat|nested] 148+ messages in thread

* [PATCH 3/3] net: hisilicon: new hip04 ethernet driver
  2014-04-05  4:35 [PATCH v7 0/3] add hisilicon " Zhangfei Gao
@ 2014-04-05  4:35   ` Zhangfei Gao
  0 siblings, 0 replies; 148+ messages in thread
From: Zhangfei Gao @ 2014-04-05  4:35 UTC (permalink / raw)
  To: davem, linux, arnd, f.fainelli, sergei.shtylyov, mark.rutland,
	David.Laight, eric.dumazet
  Cc: linux-arm-kernel, netdev, devicetree, Zhangfei Gao

Support Hisilicon hip04 ethernet driver, including 100M / 1000M controller

Signed-off-by: Zhangfei Gao <zhangfei.gao@linaro.org>
---
 drivers/net/ethernet/hisilicon/Makefile    |    2 +-
 drivers/net/ethernet/hisilicon/hip04_eth.c |  777 ++++++++++++++++++++++++++++
 2 files changed, 778 insertions(+), 1 deletion(-)
 create mode 100644 drivers/net/ethernet/hisilicon/hip04_eth.c

diff --git a/drivers/net/ethernet/hisilicon/Makefile b/drivers/net/ethernet/hisilicon/Makefile
index 1d6eb6e..17dec03 100644
--- a/drivers/net/ethernet/hisilicon/Makefile
+++ b/drivers/net/ethernet/hisilicon/Makefile
@@ -2,4 +2,4 @@
 # Makefile for the HISILICON network device drivers.
 #
 
-obj-$(CONFIG_HIP04_ETH) += hip04_mdio.o
+obj-$(CONFIG_HIP04_ETH) += hip04_mdio.o hip04_eth.o
diff --git a/drivers/net/ethernet/hisilicon/hip04_eth.c b/drivers/net/ethernet/hisilicon/hip04_eth.c
new file mode 100644
index 0000000..29549a5
--- /dev/null
+++ b/drivers/net/ethernet/hisilicon/hip04_eth.c
@@ -0,0 +1,777 @@
+
+/* Copyright (c) 2014 Linaro Ltd.
+ * Copyright (c) 2014 Hisilicon Limited.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ */
+
+#include <linux/module.h>
+#include <linux/etherdevice.h>
+#include <linux/platform_device.h>
+#include <linux/interrupt.h>
+#include <linux/of_address.h>
+#include <linux/phy.h>
+#include <linux/of_mdio.h>
+#include <linux/of_net.h>
+#include <linux/mfd/syscon.h>
+#include <linux/regmap.h>
+
+#define PPE_CFG_RX_ADDR			0x100
+#define PPE_CFG_POOL_GRP		0x300
+#define PPE_CFG_RX_BUF_SIZE		0x400
+#define PPE_CFG_RX_FIFO_SIZE		0x500
+#define PPE_CURR_BUF_CNT		0xa200
+
+#define GE_DUPLEX_TYPE			0x8
+#define GE_MAX_FRM_SIZE_REG		0x3c
+#define GE_PORT_MODE			0x40
+#define GE_PORT_EN			0x44
+#define GE_SHORT_RUNTS_THR_REG		0x50
+#define GE_TX_LOCAL_PAGE_REG		0x5c
+#define GE_TRANSMIT_CONTROL_REG		0x60
+#define GE_CF_CRC_STRIP_REG		0x1b0
+#define GE_MODE_CHANGE_EN		0x1b4
+#define GE_RECV_CONTROL_REG		0x1e0
+#define GE_STATION_MAC_ADDRESS		0x210
+#define PPE_CFG_TX_PKT_BD_ADDR		0x420
+#define PPE_CFG_MAX_FRAME_LEN_REG	0x408
+#define PPE_CFG_BUS_CTRL_REG		0x424
+#define PPE_CFG_RX_CTRL_REG		0x428
+#define PPE_CFG_RX_PKT_MODE_REG		0x438
+#define PPE_CFG_QOS_VMID_GEN		0x500
+#define PPE_CFG_RX_PKT_INT		0x538
+#define PPE_INTEN			0x600
+#define PPE_INTSTS			0x608
+#define PPE_RINT			0x604
+#define PPE_CFG_STS_MODE		0x700
+#define PPE_HIS_RX_PKT_CNT		0x804
+
+/* REG_INTERRUPT */
+#define RCV_INT				BIT(10)
+#define RCV_NOBUF			BIT(8)
+#define DEF_INT_MASK			(RCV_INT | RCV_NOBUF)
+
+#define RX_DESC_NUM			64
+#define TX_DESC_NUM			64
+#define TX_NEXT(N)			(((N) + 1) & (TX_DESC_NUM-1))
+#define RX_NEXT(N)			(((N) + 1) & (RX_DESC_NUM-1))
+
+#define GMAC_PPE_RX_PKT_MAX_LEN		379
+#define GMAC_MAX_PKT_LEN		1516
+#define DESC_DEF_CFG			0x14
+#define RX_BUF_SIZE			1600
+#define RX_PKT_ERR			0x3
+#define TX_TIMEOUT			(6 * HZ)
+#define RECLAIM_PERIOD			(HZ / 15)
+
+#define DRV_NAME			"hip04-ether"
+
+struct tx_desc {
+	u32 send_addr;
+	u16 reserved_16;
+	u16 send_size;
+	u32 reserved_32;
+	u32 cfg;
+	u32 wb_addr;
+} ____cacheline_aligned;
+
+struct rx_desc {
+	u16 reserved_16;
+	u16 pkt_len;
+	u32 reserve1[3];
+	u32 pkt_err;
+	u32 reserve2[4];
+};
+
+struct hip04_priv {
+	void __iomem *base;
+	int phy_mode;
+	int chan;
+	unsigned int port;
+	unsigned int speed;
+	unsigned int duplex;
+	unsigned int reg_inten;
+
+	struct napi_struct napi;
+	struct net_device *ndev;
+
+	struct tx_desc *tx_desc;
+	dma_addr_t tx_desc_dma;
+	struct sk_buff *tx_skb[TX_DESC_NUM];
+	dma_addr_t tx_phys[TX_DESC_NUM];
+	spinlock_t lock;
+	unsigned int tx_head;
+	unsigned int tx_tail;
+	unsigned int tx_count;
+
+	unsigned char *rx_buf[RX_DESC_NUM];
+	dma_addr_t rx_phys[RX_DESC_NUM];
+	unsigned int rx_head;
+	unsigned int rx_buf_size;
+
+	struct device_node *phy_node;
+	struct phy_device *phy;
+	struct regmap *map;
+	struct timer_list txtimer;
+};
+
+static void hip04_config_port(struct net_device *ndev, u32 speed, u32 duplex)
+{
+	struct hip04_priv *priv = netdev_priv(ndev);
+	u32 val;
+
+	priv->speed = speed;
+	priv->duplex = duplex;
+
+	switch (priv->phy_mode) {
+	case PHY_INTERFACE_MODE_SGMII:
+		if (speed == SPEED_1000)
+			val = 8;
+		else if (speed == SPEED_100)
+			val = 7;
+		else
+			val = 6;	/* SPEED_10 */
+		break;
+	case PHY_INTERFACE_MODE_MII:
+		if (speed == SPEED_100)
+			val = 1;
+		else
+			val = 0;	/* SPEED_10 */
+		break;
+	default:
+		netdev_warn(ndev, "not supported mode\n");
+		val = 0;
+		break;
+	}
+	writel_relaxed(val, priv->base + GE_PORT_MODE);
+
+	val = (duplex) ? BIT(0) : 0;
+	writel_relaxed(val, priv->base + GE_DUPLEX_TYPE);
+
+	val = BIT(0);
+	writel_relaxed(val, priv->base + GE_MODE_CHANGE_EN);
+}
+
+static void hip04_reset_ppe(struct hip04_priv *priv)
+{
+	u32 val, tmp;
+
+	do {
+		regmap_read(priv->map, priv->port * 4 + PPE_CURR_BUF_CNT, &val);
+		regmap_read(priv->map, priv->port * 4 + PPE_CFG_RX_ADDR, &tmp);
+	} while (val & 0xfff);
+}
+
+static void hip04_config_fifo(struct hip04_priv *priv)
+{
+	u32 val;
+
+	val = readl_relaxed(priv->base + PPE_CFG_STS_MODE);
+	val |= BIT(12);			/* PPE_HIS_RX_PKT_CNT read clear */
+	writel_relaxed(val, priv->base + PPE_CFG_STS_MODE);
+
+	val = BIT(priv->port);
+	regmap_write(priv->map, priv->port * 4 + PPE_CFG_POOL_GRP, val);
+
+	val = priv->port << 8;
+	val |= BIT(14);
+	writel_relaxed(val, priv->base + PPE_CFG_QOS_VMID_GEN);
+
+	val = RX_BUF_SIZE;
+	regmap_write(priv->map, priv->port * 4 + PPE_CFG_RX_BUF_SIZE, val);
+
+	val = RX_DESC_NUM << 16;	/* depth */
+	val |= BIT(11);			/* seq: first set first ues */
+	val |= RX_DESC_NUM * priv->chan;	/* start_addr */
+	regmap_write(priv->map, priv->port * 4 + PPE_CFG_RX_FIFO_SIZE, val);
+
+	/* pkt store format */
+	val = NET_IP_ALIGN << 11;	/* align */
+	writel_relaxed(val, priv->base + PPE_CFG_RX_CTRL_REG);
+
+	/* following cfg required for 1000M */
+	/* pkt mode */
+	val = BIT(18);			/* align */
+	writel_relaxed(val, priv->base + PPE_CFG_RX_PKT_MODE_REG);
+
+	/* set bus ctrl */
+	val = BIT(14);			/* buffer locally release */
+	val |= BIT(0);			/* big endian */
+	writel_relaxed(val, priv->base + PPE_CFG_BUS_CTRL_REG);
+
+	/* set max pkt len, curtail if exceed */
+	val = GMAC_PPE_RX_PKT_MAX_LEN;	/* max buffer len */
+	writel_relaxed(val, priv->base + PPE_CFG_MAX_FRAME_LEN_REG);
+
+	/* set max len of each pkt */
+	val = GMAC_MAX_PKT_LEN;		/* max buffer len */
+	writel_relaxed(val, priv->base + GE_MAX_FRM_SIZE_REG);
+
+	/* set min len of each pkt */
+	val = 31;			/* min buffer len */
+	writel_relaxed(val, priv->base + GE_SHORT_RUNTS_THR_REG);
+
+	/* tx */
+	val = readl_relaxed(priv->base + GE_TRANSMIT_CONTROL_REG);
+	val |= BIT(5);			/* tx auto neg */
+	val |= BIT(6);			/* tx add crc */
+	val |= BIT(7);			/* tx short pad through */
+	writel_relaxed(val, priv->base + GE_TRANSMIT_CONTROL_REG);
+
+	/* rx crc */
+	val = BIT(0);			/* rx strip crc */
+	writel_relaxed(val, priv->base + GE_CF_CRC_STRIP_REG);
+
+	/* rx */
+	val = readl_relaxed(priv->base + GE_RECV_CONTROL_REG);
+	val |= BIT(3);			/* rx strip pad */
+	val |= BIT(4);			/* run pkt en */
+	writel_relaxed(val, priv->base + GE_RECV_CONTROL_REG);
+
+	/* auto neg control */
+	val = BIT(0);
+	writel_relaxed(val, priv->base + GE_TX_LOCAL_PAGE_REG);
+}
+
+static void hip04_mac_enable(struct net_device *ndev)
+{
+	struct hip04_priv *priv = netdev_priv(ndev);
+	u32 val;
+
+	/* enable tx & rx */
+	val = readl_relaxed(priv->base + GE_PORT_EN);
+	val |= BIT(1);		/* rx */
+	val |= BIT(2);		/* tx */
+	writel_relaxed(val, priv->base + GE_PORT_EN);
+
+	/* clear rx int */
+	val = RCV_INT;
+	writel_relaxed(val, priv->base + PPE_RINT);
+
+	/* config recv int */
+	val = BIT(6);		/* int threshold 1 package */
+	val |= 0x4;		/* recv timeout */
+	writel_relaxed(val, priv->base + PPE_CFG_RX_PKT_INT);
+
+	/* enable interrupt */
+	priv->reg_inten = DEF_INT_MASK;
+	writel_relaxed(priv->reg_inten, priv->base + PPE_INTEN);
+}
+
+static void hip04_mac_disable(struct net_device *ndev)
+{
+	struct hip04_priv *priv = netdev_priv(ndev);
+	u32 val;
+
+	/* disable int */
+	priv->reg_inten &= ~(RCV_INT | RCV_NOBUF);
+	writel_relaxed(priv->reg_inten, priv->base + PPE_INTEN);
+
+	/* disable tx & rx */
+	val = readl_relaxed(priv->base + GE_PORT_EN);
+	val &= ~(BIT(1));	/* rx */
+	val &= ~(BIT(2));	/* tx */
+	writel_relaxed(val, priv->base + GE_PORT_EN);
+}
+
+static void hip04_set_xmit_desc(struct hip04_priv *priv, dma_addr_t phys)
+{
+	writel(phys, priv->base + PPE_CFG_TX_PKT_BD_ADDR);
+}
+
+static void hip04_set_recv_desc(struct hip04_priv *priv, dma_addr_t phys)
+{
+	regmap_write(priv->map, priv->port * 4 + PPE_CFG_RX_ADDR, phys);
+}
+
+static u32 hip04_recv_cnt(struct hip04_priv *priv)
+{
+	return readl(priv->base + PPE_HIS_RX_PKT_CNT);
+}
+
+static void hip04_update_mac_address(struct net_device *ndev)
+{
+	struct hip04_priv *priv = netdev_priv(ndev);
+
+	writel_relaxed(((ndev->dev_addr[0] << 8) | (ndev->dev_addr[1])),
+			priv->base + GE_STATION_MAC_ADDRESS);
+	writel_relaxed(((ndev->dev_addr[2] << 24) | (ndev->dev_addr[3] << 16) |
+			(ndev->dev_addr[4] << 8) | (ndev->dev_addr[5])),
+			priv->base + GE_STATION_MAC_ADDRESS + 4);
+}
+
+static int hip04_set_mac_address(struct net_device *ndev, void *addr)
+{
+	eth_mac_addr(ndev, addr);
+	hip04_update_mac_address(ndev);
+	return 0;
+}
+
+static void hip04_tx_reclaim(struct net_device *ndev, bool force)
+{
+	struct hip04_priv *priv = netdev_priv(ndev);
+	unsigned tx_head = priv->tx_head;
+	unsigned tx_tail = priv->tx_tail;
+	struct tx_desc *desc;
+	unsigned int bytes_compl = 0, pkts_compl = 0;
+
+	spin_lock_bh(&priv->lock);
+	while ((tx_tail != tx_head) || (priv->tx_count == TX_DESC_NUM)) {
+		desc = &priv->tx_desc[priv->tx_tail];
+		if (desc->send_addr != 0) {
+			if (force)
+				desc->send_addr = 0;
+			else
+				break;
+		}
+		if (priv->tx_phys[tx_tail]) {
+			dma_unmap_single(&ndev->dev, priv->tx_phys[tx_tail],
+				priv->tx_skb[tx_tail]->len, DMA_TO_DEVICE);
+			priv->tx_phys[tx_tail] = 0;
+		}
+		pkts_compl++;
+		bytes_compl += priv->tx_skb[tx_tail]->len;
+		dev_kfree_skb(priv->tx_skb[tx_tail]);
+		priv->tx_skb[tx_tail] = NULL;
+		tx_tail = TX_NEXT(tx_tail);
+		priv->tx_count--;
+	}
+	priv->tx_tail = tx_tail;
+	spin_unlock_bh(&priv->lock);
+
+	netdev_completed_queue(ndev, pkts_compl, bytes_compl);
+
+	if (priv->tx_count)
+		mod_timer(&priv->txtimer, jiffies + RECLAIM_PERIOD);
+
+	if (unlikely(netif_queue_stopped(ndev)) &&
+		(priv->tx_count < TX_DESC_NUM))
+		netif_wake_queue(ndev);
+}
+
+static void hip04_xmit_timer(unsigned long data)
+{
+	struct net_device *ndev = (void *) data;
+
+	hip04_tx_reclaim(ndev, false);
+}
+
+static int hip04_mac_start_xmit(struct sk_buff *skb, struct net_device *ndev)
+{
+	struct hip04_priv *priv = netdev_priv(ndev);
+	struct net_device_stats *stats = &ndev->stats;
+	unsigned int tx_head = priv->tx_head;
+	struct tx_desc *desc = &priv->tx_desc[tx_head];
+	dma_addr_t phys;
+
+	hip04_tx_reclaim(ndev, false);
+	mod_timer(&priv->txtimer, jiffies + RECLAIM_PERIOD);
+
+	if (priv->tx_count >= TX_DESC_NUM) {
+		netif_stop_queue(ndev);
+		return NETDEV_TX_BUSY;
+	}
+
+	phys = dma_map_single(&ndev->dev, skb->data, skb->len, DMA_TO_DEVICE);
+	if (dma_mapping_error(&ndev->dev, phys)) {
+		dev_kfree_skb(skb);
+		return NETDEV_TX_OK;
+	}
+
+	priv->tx_skb[tx_head] = skb;
+	priv->tx_phys[tx_head] = phys;
+	desc->send_addr = cpu_to_be32(phys);
+	desc->send_size = cpu_to_be16(skb->len);
+	desc->cfg = cpu_to_be32(DESC_DEF_CFG);
+	phys = priv->tx_desc_dma + tx_head * sizeof(struct tx_desc);
+	desc->wb_addr = cpu_to_be32(phys);
+	skb_tx_timestamp(skb);
+
+	/* Don't wait up for transmitted skbs to be freed. */
+	skb_orphan(skb);
+
+	hip04_set_xmit_desc(priv, phys);
+	priv->tx_head = TX_NEXT(tx_head);
+	netdev_sent_queue(ndev, skb->len);
+
+	stats->tx_bytes += skb->len;
+	stats->tx_packets++;
+	priv->tx_count++;
+
+	return NETDEV_TX_OK;
+}
+
+static int hip04_rx_poll(struct napi_struct *napi, int budget)
+{
+	struct hip04_priv *priv = container_of(napi, struct hip04_priv, napi);
+	struct net_device *ndev = priv->ndev;
+	struct net_device_stats *stats = &ndev->stats;
+	unsigned int cnt = hip04_recv_cnt(priv);
+	struct rx_desc *desc;
+	struct sk_buff *skb;
+	unsigned char *buf;
+	bool last = false;
+	dma_addr_t phys;
+	int rx = 0;
+	u16 len;
+	u32 err;
+
+	while (cnt && !last) {
+		buf = priv->rx_buf[priv->rx_head];
+		skb = build_skb(buf, priv->rx_buf_size);
+		if (unlikely(!skb))
+			net_dbg_ratelimited("build_skb failed\n");
+
+		dma_unmap_single(&ndev->dev, priv->rx_phys[priv->rx_head],
+				RX_BUF_SIZE, DMA_FROM_DEVICE);
+		priv->rx_phys[priv->rx_head] = 0;
+
+		desc = (struct rx_desc *)skb->data;
+		len = be16_to_cpu(desc->pkt_len);
+		err = be32_to_cpu(desc->pkt_err);
+
+		if (len > RX_BUF_SIZE)
+			len = RX_BUF_SIZE;
+
+		if (0 == len) {
+			dev_kfree_skb_any(skb);
+			last = true;
+		} else if (err & RX_PKT_ERR) {
+			dev_kfree_skb_any(skb);
+			stats->rx_dropped++;
+			stats->rx_errors++;
+		} else {
+			skb_reserve(skb, NET_SKB_PAD + NET_IP_ALIGN);
+			skb_put(skb, len);
+			skb->protocol = eth_type_trans(skb, ndev);
+			napi_gro_receive(&priv->napi, skb);
+			stats->rx_packets++;
+			stats->rx_bytes += len;
+			rx++;
+		}
+
+		buf = netdev_alloc_frag(priv->rx_buf_size);
+		if (!buf)
+			return -ENOMEM;
+		phys = dma_map_single(&ndev->dev, buf,
+				RX_BUF_SIZE, DMA_FROM_DEVICE);
+		if (dma_mapping_error(&ndev->dev, phys))
+			return -EIO;
+		priv->rx_buf[priv->rx_head] = buf;
+		priv->rx_phys[priv->rx_head] = phys;
+		hip04_set_recv_desc(priv, phys);
+
+		priv->rx_head = RX_NEXT(priv->rx_head);
+		if (rx >= budget)
+			break;
+
+		if (--cnt == 0)
+			cnt = hip04_recv_cnt(priv);
+	}
+
+	if (rx < budget) {
+		napi_complete(napi);
+
+		/* enable rx interrupt */
+		priv->reg_inten |= RCV_INT | RCV_NOBUF;
+		writel_relaxed(priv->reg_inten, priv->base + PPE_INTEN);
+	}
+
+	return rx;
+}
+
+static irqreturn_t hip04_mac_interrupt(int irq, void *dev_id)
+{
+	struct net_device *ndev = (struct net_device *) dev_id;
+	struct hip04_priv *priv = netdev_priv(ndev);
+	u32 ists = readl_relaxed(priv->base + PPE_INTSTS);
+
+	writel_relaxed(DEF_INT_MASK, priv->base + PPE_RINT);
+
+	if (ists & (RCV_INT | RCV_NOBUF)) {
+		/* disable rx interrupt */
+		priv->reg_inten &= ~(RCV_INT | RCV_NOBUF);
+		writel_relaxed(priv->reg_inten, priv->base + PPE_INTEN);
+		napi_schedule(&priv->napi);
+	}
+
+	return IRQ_HANDLED;
+}
+
+static void hip04_adjust_link(struct net_device *ndev)
+{
+	struct hip04_priv *priv = netdev_priv(ndev);
+	struct phy_device *phy = priv->phy;
+
+	if ((priv->speed != phy->speed) || (priv->duplex != phy->duplex)) {
+		hip04_config_port(ndev, phy->speed, phy->duplex);
+		phy_print_status(phy);
+	}
+}
+
+static int hip04_mac_open(struct net_device *ndev)
+{
+	struct hip04_priv *priv = netdev_priv(ndev);
+	int i;
+
+	priv->rx_head = 0;
+	priv->tx_head = 0;
+	priv->tx_tail = 0;
+	priv->tx_count = 0;
+
+	hip04_reset_ppe(priv);
+	for (i = 0; i < RX_DESC_NUM; i++) {
+		dma_addr_t phys;
+
+		phys = dma_map_single(&ndev->dev, priv->rx_buf[i],
+				RX_BUF_SIZE, DMA_FROM_DEVICE);
+		if (dma_mapping_error(&ndev->dev, phys))
+			return -EIO;
+
+		priv->rx_phys[i] = phys;
+		hip04_set_recv_desc(priv, phys);
+	}
+
+	if (priv->phy)
+		phy_start(priv->phy);
+
+	netif_start_queue(ndev);
+	hip04_mac_enable(ndev);
+	napi_enable(&priv->napi);
+
+	return 0;
+}
+
+static int hip04_mac_stop(struct net_device *ndev)
+{
+	struct hip04_priv *priv = netdev_priv(ndev);
+	int i;
+
+	napi_disable(&priv->napi);
+	netif_stop_queue(ndev);
+	hip04_mac_disable(ndev);
+	hip04_tx_reclaim(ndev, true);
+	hip04_reset_ppe(priv);
+
+	if (priv->phy)
+		phy_stop(priv->phy);
+
+	for (i = 0; i < RX_DESC_NUM; i++) {
+		if (priv->rx_phys[i]) {
+			dma_unmap_single(&ndev->dev, priv->rx_phys[i],
+					RX_BUF_SIZE, DMA_FROM_DEVICE);
+			priv->rx_phys[i] = 0;
+		}
+	}
+
+	return 0;
+}
+
+static void hip04_timeout(struct net_device *ndev)
+{
+	hip04_mac_stop(ndev);
+	hip04_mac_open(ndev);
+	return;
+}
+
+static struct net_device_ops hip04_netdev_ops = {
+	.ndo_open		= hip04_mac_open,
+	.ndo_stop		= hip04_mac_stop,
+	.ndo_start_xmit		= hip04_mac_start_xmit,
+	.ndo_set_mac_address	= hip04_set_mac_address,
+	.ndo_tx_timeout         = hip04_timeout,
+	.ndo_validate_addr	= eth_validate_addr,
+	.ndo_change_mtu		= eth_change_mtu,
+};
+
+static int hip04_alloc_ring(struct net_device *ndev, struct device *d)
+{
+	struct hip04_priv *priv = netdev_priv(ndev);
+	int i;
+
+	priv->tx_desc = dma_alloc_coherent(d,
+			TX_DESC_NUM * sizeof(struct tx_desc),
+			&priv->tx_desc_dma, GFP_KERNEL);
+	if (!priv->tx_desc)
+		return -ENOMEM;
+
+	priv->rx_buf_size = RX_BUF_SIZE +
+			    SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
+	for (i = 0; i < RX_DESC_NUM; i++) {
+		priv->rx_buf[i] = netdev_alloc_frag(priv->rx_buf_size);
+		if (!priv->rx_buf[i])
+			return -ENOMEM;
+	}
+
+	return 0;
+}
+
+static void hip04_free_ring(struct net_device *ndev, struct device *d)
+{
+	struct hip04_priv *priv = netdev_priv(ndev);
+	int i;
+
+	for (i = 0; i < RX_DESC_NUM; i++)
+		if (priv->rx_buf[i])
+			put_page(virt_to_head_page(priv->rx_buf[i]));
+
+	for (i = 0; i < TX_DESC_NUM; i++)
+		if (priv->tx_skb[i])
+			dev_kfree_skb_any(priv->tx_skb[i]);
+
+	dma_free_coherent(d, TX_DESC_NUM * sizeof(struct tx_desc),
+			priv->tx_desc, priv->tx_desc_dma);
+}
+
+static int hip04_mac_probe(struct platform_device *pdev)
+{
+	struct device *d = &pdev->dev;
+	struct device_node *node = d->of_node;
+	struct of_phandle_args arg;
+	struct net_device *ndev;
+	struct hip04_priv *priv;
+	struct resource *res;
+	unsigned int irq;
+	int ret;
+
+	ndev = alloc_etherdev(sizeof(struct hip04_priv));
+	if (!ndev)
+		return -ENOMEM;
+
+	priv = netdev_priv(ndev);
+	priv->ndev = ndev;
+	platform_set_drvdata(pdev, ndev);
+
+	res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
+	priv->base = devm_ioremap_resource(d, res);
+	if (IS_ERR(priv->base)) {
+		ret = PTR_ERR(priv->base);
+		goto init_fail;
+	}
+
+	ret = of_parse_phandle_with_fixed_args(node, "port-handle", 2, 0, &arg);
+	if (ret < 0) {
+		dev_warn(d, "no port-handle\n");
+		goto init_fail;
+	}
+
+	priv->port = arg.args[0];
+	priv->chan = arg.args[1];
+
+	priv->map = syscon_node_to_regmap(arg.np);
+	if (IS_ERR(priv->map)) {
+		dev_warn(d, "no syscon hisilicon,hip04-ppe\n");
+		ret = PTR_ERR(priv->map);
+		goto init_fail;
+	}
+
+	priv->phy_mode = of_get_phy_mode(node);
+	if (priv->phy_mode < 0) {
+		dev_warn(d, "not find phy-mode\n");
+		ret = -EINVAL;
+		goto init_fail;
+	}
+
+	irq = platform_get_irq(pdev, 0);
+	if (irq <= 0) {
+		ret = -EINVAL;
+		goto init_fail;
+	}
+
+	ret = devm_request_irq(d, irq, hip04_mac_interrupt,
+				0, pdev->name, ndev);
+	if (ret) {
+		netdev_err(ndev, "devm_request_irq failed\n");
+		goto init_fail;
+	}
+
+	priv->phy_node = of_parse_phandle(node, "phy-handle", 0);
+	if (priv->phy_node) {
+		priv->phy = of_phy_connect(ndev, priv->phy_node,
+			&hip04_adjust_link, 0, priv->phy_mode);
+		if (!priv->phy) {
+			ret = -EPROBE_DEFER;
+			goto init_fail;
+		}
+	}
+
+	ether_setup(ndev);
+	ndev->netdev_ops = &hip04_netdev_ops;
+	ndev->watchdog_timeo = TX_TIMEOUT;
+	ndev->priv_flags |= IFF_UNICAST_FLT;
+	ndev->irq = irq;
+	netif_napi_add(ndev, &priv->napi, hip04_rx_poll, RX_DESC_NUM);
+	SET_NETDEV_DEV(ndev, &pdev->dev);
+
+	hip04_reset_ppe(priv);
+	if (priv->phy_mode == PHY_INTERFACE_MODE_MII)
+		hip04_config_port(ndev, SPEED_100, DUPLEX_FULL);
+
+	hip04_config_fifo(priv);
+	random_ether_addr(ndev->dev_addr);
+	hip04_update_mac_address(ndev);
+
+	ret = hip04_alloc_ring(ndev, d);
+	if (ret) {
+		netdev_err(ndev, "alloc ring fail\n");
+		goto alloc_fail;
+	}
+
+	setup_timer(&priv->txtimer, hip04_xmit_timer, (unsigned long) ndev);
+	ret = register_netdev(ndev);
+	if (ret) {
+		free_netdev(ndev);
+		goto alloc_fail;
+	}
+
+	return 0;
+
+alloc_fail:
+	hip04_free_ring(ndev, d);
+init_fail:
+	of_node_put(priv->phy_node);
+	free_netdev(ndev);
+	return ret;
+}
+
+static int hip04_remove(struct platform_device *pdev)
+{
+	struct net_device *ndev = platform_get_drvdata(pdev);
+	struct hip04_priv *priv = netdev_priv(ndev);
+	struct device *d = &pdev->dev;
+
+	if (priv->phy)
+		phy_disconnect(priv->phy);
+
+	del_timer_sync(&priv->txtimer);
+	hip04_free_ring(ndev, d);
+	unregister_netdev(ndev);
+	free_irq(ndev->irq, ndev);
+	of_node_put(priv->phy_node);
+	free_netdev(ndev);
+
+	return 0;
+}
+
+static const struct of_device_id hip04_mac_match[] = {
+	{ .compatible = "hisilicon,hip04-mac" },
+	{ }
+};
+
+static struct platform_driver hip04_mac_driver = {
+	.probe	= hip04_mac_probe,
+	.remove	= hip04_remove,
+	.driver	= {
+		.name		= DRV_NAME,
+		.owner		= THIS_MODULE,
+		.of_match_table	= hip04_mac_match,
+	},
+};
+module_platform_driver(hip04_mac_driver);
+
+MODULE_DESCRIPTION("HISILICON P04 Ethernet driver");
+MODULE_LICENSE("GPL v2");
+MODULE_ALIAS("platform:hip04-ether");
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 148+ messages in thread

* [PATCH 3/3] net: hisilicon: new hip04 ethernet driver
@ 2014-04-05  4:35   ` Zhangfei Gao
  0 siblings, 0 replies; 148+ messages in thread
From: Zhangfei Gao @ 2014-04-05  4:35 UTC (permalink / raw)
  To: linux-arm-kernel

Support Hisilicon hip04 ethernet driver, including 100M / 1000M controller

Signed-off-by: Zhangfei Gao <zhangfei.gao@linaro.org>
---
 drivers/net/ethernet/hisilicon/Makefile    |    2 +-
 drivers/net/ethernet/hisilicon/hip04_eth.c |  777 ++++++++++++++++++++++++++++
 2 files changed, 778 insertions(+), 1 deletion(-)
 create mode 100644 drivers/net/ethernet/hisilicon/hip04_eth.c

diff --git a/drivers/net/ethernet/hisilicon/Makefile b/drivers/net/ethernet/hisilicon/Makefile
index 1d6eb6e..17dec03 100644
--- a/drivers/net/ethernet/hisilicon/Makefile
+++ b/drivers/net/ethernet/hisilicon/Makefile
@@ -2,4 +2,4 @@
 # Makefile for the HISILICON network device drivers.
 #
 
-obj-$(CONFIG_HIP04_ETH) += hip04_mdio.o
+obj-$(CONFIG_HIP04_ETH) += hip04_mdio.o hip04_eth.o
diff --git a/drivers/net/ethernet/hisilicon/hip04_eth.c b/drivers/net/ethernet/hisilicon/hip04_eth.c
new file mode 100644
index 0000000..29549a5
--- /dev/null
+++ b/drivers/net/ethernet/hisilicon/hip04_eth.c
@@ -0,0 +1,777 @@
+
+/* Copyright (c) 2014 Linaro Ltd.
+ * Copyright (c) 2014 Hisilicon Limited.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ */
+
+#include <linux/module.h>
+#include <linux/etherdevice.h>
+#include <linux/platform_device.h>
+#include <linux/interrupt.h>
+#include <linux/of_address.h>
+#include <linux/phy.h>
+#include <linux/of_mdio.h>
+#include <linux/of_net.h>
+#include <linux/mfd/syscon.h>
+#include <linux/regmap.h>
+
+#define PPE_CFG_RX_ADDR			0x100
+#define PPE_CFG_POOL_GRP		0x300
+#define PPE_CFG_RX_BUF_SIZE		0x400
+#define PPE_CFG_RX_FIFO_SIZE		0x500
+#define PPE_CURR_BUF_CNT		0xa200
+
+#define GE_DUPLEX_TYPE			0x8
+#define GE_MAX_FRM_SIZE_REG		0x3c
+#define GE_PORT_MODE			0x40
+#define GE_PORT_EN			0x44
+#define GE_SHORT_RUNTS_THR_REG		0x50
+#define GE_TX_LOCAL_PAGE_REG		0x5c
+#define GE_TRANSMIT_CONTROL_REG		0x60
+#define GE_CF_CRC_STRIP_REG		0x1b0
+#define GE_MODE_CHANGE_EN		0x1b4
+#define GE_RECV_CONTROL_REG		0x1e0
+#define GE_STATION_MAC_ADDRESS		0x210
+#define PPE_CFG_TX_PKT_BD_ADDR		0x420
+#define PPE_CFG_MAX_FRAME_LEN_REG	0x408
+#define PPE_CFG_BUS_CTRL_REG		0x424
+#define PPE_CFG_RX_CTRL_REG		0x428
+#define PPE_CFG_RX_PKT_MODE_REG		0x438
+#define PPE_CFG_QOS_VMID_GEN		0x500
+#define PPE_CFG_RX_PKT_INT		0x538
+#define PPE_INTEN			0x600
+#define PPE_INTSTS			0x608
+#define PPE_RINT			0x604
+#define PPE_CFG_STS_MODE		0x700
+#define PPE_HIS_RX_PKT_CNT		0x804
+
+/* REG_INTERRUPT */
+#define RCV_INT				BIT(10)
+#define RCV_NOBUF			BIT(8)
+#define DEF_INT_MASK			(RCV_INT | RCV_NOBUF)
+
+#define RX_DESC_NUM			64
+#define TX_DESC_NUM			64
+#define TX_NEXT(N)			(((N) + 1) & (TX_DESC_NUM-1))
+#define RX_NEXT(N)			(((N) + 1) & (RX_DESC_NUM-1))
+
+#define GMAC_PPE_RX_PKT_MAX_LEN		379
+#define GMAC_MAX_PKT_LEN		1516
+#define DESC_DEF_CFG			0x14
+#define RX_BUF_SIZE			1600
+#define RX_PKT_ERR			0x3
+#define TX_TIMEOUT			(6 * HZ)
+#define RECLAIM_PERIOD			(HZ / 15)
+
+#define DRV_NAME			"hip04-ether"
+
+struct tx_desc {
+	u32 send_addr;
+	u16 reserved_16;
+	u16 send_size;
+	u32 reserved_32;
+	u32 cfg;
+	u32 wb_addr;
+} ____cacheline_aligned;
+
+struct rx_desc {
+	u16 reserved_16;
+	u16 pkt_len;
+	u32 reserve1[3];
+	u32 pkt_err;
+	u32 reserve2[4];
+};
+
+struct hip04_priv {
+	void __iomem *base;
+	int phy_mode;
+	int chan;
+	unsigned int port;
+	unsigned int speed;
+	unsigned int duplex;
+	unsigned int reg_inten;
+
+	struct napi_struct napi;
+	struct net_device *ndev;
+
+	struct tx_desc *tx_desc;
+	dma_addr_t tx_desc_dma;
+	struct sk_buff *tx_skb[TX_DESC_NUM];
+	dma_addr_t tx_phys[TX_DESC_NUM];
+	spinlock_t lock;
+	unsigned int tx_head;
+	unsigned int tx_tail;
+	unsigned int tx_count;
+
+	unsigned char *rx_buf[RX_DESC_NUM];
+	dma_addr_t rx_phys[RX_DESC_NUM];
+	unsigned int rx_head;
+	unsigned int rx_buf_size;
+
+	struct device_node *phy_node;
+	struct phy_device *phy;
+	struct regmap *map;
+	struct timer_list txtimer;
+};
+
+static void hip04_config_port(struct net_device *ndev, u32 speed, u32 duplex)
+{
+	struct hip04_priv *priv = netdev_priv(ndev);
+	u32 val;
+
+	priv->speed = speed;
+	priv->duplex = duplex;
+
+	switch (priv->phy_mode) {
+	case PHY_INTERFACE_MODE_SGMII:
+		if (speed == SPEED_1000)
+			val = 8;
+		else if (speed == SPEED_100)
+			val = 7;
+		else
+			val = 6;	/* SPEED_10 */
+		break;
+	case PHY_INTERFACE_MODE_MII:
+		if (speed == SPEED_100)
+			val = 1;
+		else
+			val = 0;	/* SPEED_10 */
+		break;
+	default:
+		netdev_warn(ndev, "not supported mode\n");
+		val = 0;
+		break;
+	}
+	writel_relaxed(val, priv->base + GE_PORT_MODE);
+
+	val = (duplex) ? BIT(0) : 0;
+	writel_relaxed(val, priv->base + GE_DUPLEX_TYPE);
+
+	val = BIT(0);
+	writel_relaxed(val, priv->base + GE_MODE_CHANGE_EN);
+}
+
+static void hip04_reset_ppe(struct hip04_priv *priv)
+{
+	u32 val, tmp;
+
+	do {
+		regmap_read(priv->map, priv->port * 4 + PPE_CURR_BUF_CNT, &val);
+		regmap_read(priv->map, priv->port * 4 + PPE_CFG_RX_ADDR, &tmp);
+	} while (val & 0xfff);
+}
+
+static void hip04_config_fifo(struct hip04_priv *priv)
+{
+	u32 val;
+
+	val = readl_relaxed(priv->base + PPE_CFG_STS_MODE);
+	val |= BIT(12);			/* PPE_HIS_RX_PKT_CNT read clear */
+	writel_relaxed(val, priv->base + PPE_CFG_STS_MODE);
+
+	val = BIT(priv->port);
+	regmap_write(priv->map, priv->port * 4 + PPE_CFG_POOL_GRP, val);
+
+	val = priv->port << 8;
+	val |= BIT(14);
+	writel_relaxed(val, priv->base + PPE_CFG_QOS_VMID_GEN);
+
+	val = RX_BUF_SIZE;
+	regmap_write(priv->map, priv->port * 4 + PPE_CFG_RX_BUF_SIZE, val);
+
+	val = RX_DESC_NUM << 16;	/* depth */
+	val |= BIT(11);			/* seq: first set first ues */
+	val |= RX_DESC_NUM * priv->chan;	/* start_addr */
+	regmap_write(priv->map, priv->port * 4 + PPE_CFG_RX_FIFO_SIZE, val);
+
+	/* pkt store format */
+	val = NET_IP_ALIGN << 11;	/* align */
+	writel_relaxed(val, priv->base + PPE_CFG_RX_CTRL_REG);
+
+	/* following cfg required for 1000M */
+	/* pkt mode */
+	val = BIT(18);			/* align */
+	writel_relaxed(val, priv->base + PPE_CFG_RX_PKT_MODE_REG);
+
+	/* set bus ctrl */
+	val = BIT(14);			/* buffer locally release */
+	val |= BIT(0);			/* big endian */
+	writel_relaxed(val, priv->base + PPE_CFG_BUS_CTRL_REG);
+
+	/* set max pkt len, curtail if exceed */
+	val = GMAC_PPE_RX_PKT_MAX_LEN;	/* max buffer len */
+	writel_relaxed(val, priv->base + PPE_CFG_MAX_FRAME_LEN_REG);
+
+	/* set max len of each pkt */
+	val = GMAC_MAX_PKT_LEN;		/* max buffer len */
+	writel_relaxed(val, priv->base + GE_MAX_FRM_SIZE_REG);
+
+	/* set min len of each pkt */
+	val = 31;			/* min buffer len */
+	writel_relaxed(val, priv->base + GE_SHORT_RUNTS_THR_REG);
+
+	/* tx */
+	val = readl_relaxed(priv->base + GE_TRANSMIT_CONTROL_REG);
+	val |= BIT(5);			/* tx auto neg */
+	val |= BIT(6);			/* tx add crc */
+	val |= BIT(7);			/* tx short pad through */
+	writel_relaxed(val, priv->base + GE_TRANSMIT_CONTROL_REG);
+
+	/* rx crc */
+	val = BIT(0);			/* rx strip crc */
+	writel_relaxed(val, priv->base + GE_CF_CRC_STRIP_REG);
+
+	/* rx */
+	val = readl_relaxed(priv->base + GE_RECV_CONTROL_REG);
+	val |= BIT(3);			/* rx strip pad */
+	val |= BIT(4);			/* run pkt en */
+	writel_relaxed(val, priv->base + GE_RECV_CONTROL_REG);
+
+	/* auto neg control */
+	val = BIT(0);
+	writel_relaxed(val, priv->base + GE_TX_LOCAL_PAGE_REG);
+}
+
+static void hip04_mac_enable(struct net_device *ndev)
+{
+	struct hip04_priv *priv = netdev_priv(ndev);
+	u32 val;
+
+	/* enable tx & rx */
+	val = readl_relaxed(priv->base + GE_PORT_EN);
+	val |= BIT(1);		/* rx */
+	val |= BIT(2);		/* tx */
+	writel_relaxed(val, priv->base + GE_PORT_EN);
+
+	/* clear rx int */
+	val = RCV_INT;
+	writel_relaxed(val, priv->base + PPE_RINT);
+
+	/* config recv int */
+	val = BIT(6);		/* int threshold 1 package */
+	val |= 0x4;		/* recv timeout */
+	writel_relaxed(val, priv->base + PPE_CFG_RX_PKT_INT);
+
+	/* enable interrupt */
+	priv->reg_inten = DEF_INT_MASK;
+	writel_relaxed(priv->reg_inten, priv->base + PPE_INTEN);
+}
+
+static void hip04_mac_disable(struct net_device *ndev)
+{
+	struct hip04_priv *priv = netdev_priv(ndev);
+	u32 val;
+
+	/* disable int */
+	priv->reg_inten &= ~(RCV_INT | RCV_NOBUF);
+	writel_relaxed(priv->reg_inten, priv->base + PPE_INTEN);
+
+	/* disable tx & rx */
+	val = readl_relaxed(priv->base + GE_PORT_EN);
+	val &= ~(BIT(1));	/* rx */
+	val &= ~(BIT(2));	/* tx */
+	writel_relaxed(val, priv->base + GE_PORT_EN);
+}
+
+static void hip04_set_xmit_desc(struct hip04_priv *priv, dma_addr_t phys)
+{
+	writel(phys, priv->base + PPE_CFG_TX_PKT_BD_ADDR);
+}
+
+static void hip04_set_recv_desc(struct hip04_priv *priv, dma_addr_t phys)
+{
+	regmap_write(priv->map, priv->port * 4 + PPE_CFG_RX_ADDR, phys);
+}
+
+static u32 hip04_recv_cnt(struct hip04_priv *priv)
+{
+	return readl(priv->base + PPE_HIS_RX_PKT_CNT);
+}
+
+static void hip04_update_mac_address(struct net_device *ndev)
+{
+	struct hip04_priv *priv = netdev_priv(ndev);
+
+	writel_relaxed(((ndev->dev_addr[0] << 8) | (ndev->dev_addr[1])),
+			priv->base + GE_STATION_MAC_ADDRESS);
+	writel_relaxed(((ndev->dev_addr[2] << 24) | (ndev->dev_addr[3] << 16) |
+			(ndev->dev_addr[4] << 8) | (ndev->dev_addr[5])),
+			priv->base + GE_STATION_MAC_ADDRESS + 4);
+}
+
+static int hip04_set_mac_address(struct net_device *ndev, void *addr)
+{
+	eth_mac_addr(ndev, addr);
+	hip04_update_mac_address(ndev);
+	return 0;
+}
+
+static void hip04_tx_reclaim(struct net_device *ndev, bool force)
+{
+	struct hip04_priv *priv = netdev_priv(ndev);
+	unsigned tx_head = priv->tx_head;
+	unsigned tx_tail = priv->tx_tail;
+	struct tx_desc *desc;
+	unsigned int bytes_compl = 0, pkts_compl = 0;
+
+	spin_lock_bh(&priv->lock);
+	while ((tx_tail != tx_head) || (priv->tx_count == TX_DESC_NUM)) {
+		desc = &priv->tx_desc[priv->tx_tail];
+		if (desc->send_addr != 0) {
+			if (force)
+				desc->send_addr = 0;
+			else
+				break;
+		}
+		if (priv->tx_phys[tx_tail]) {
+			dma_unmap_single(&ndev->dev, priv->tx_phys[tx_tail],
+				priv->tx_skb[tx_tail]->len, DMA_TO_DEVICE);
+			priv->tx_phys[tx_tail] = 0;
+		}
+		pkts_compl++;
+		bytes_compl += priv->tx_skb[tx_tail]->len;
+		dev_kfree_skb(priv->tx_skb[tx_tail]);
+		priv->tx_skb[tx_tail] = NULL;
+		tx_tail = TX_NEXT(tx_tail);
+		priv->tx_count--;
+	}
+	priv->tx_tail = tx_tail;
+	spin_unlock_bh(&priv->lock);
+
+	netdev_completed_queue(ndev, pkts_compl, bytes_compl);
+
+	if (priv->tx_count)
+		mod_timer(&priv->txtimer, jiffies + RECLAIM_PERIOD);
+
+	if (unlikely(netif_queue_stopped(ndev)) &&
+		(priv->tx_count < TX_DESC_NUM))
+		netif_wake_queue(ndev);
+}
+
+static void hip04_xmit_timer(unsigned long data)
+{
+	struct net_device *ndev = (void *) data;
+
+	hip04_tx_reclaim(ndev, false);
+}
+
+static int hip04_mac_start_xmit(struct sk_buff *skb, struct net_device *ndev)
+{
+	struct hip04_priv *priv = netdev_priv(ndev);
+	struct net_device_stats *stats = &ndev->stats;
+	unsigned int tx_head = priv->tx_head;
+	struct tx_desc *desc = &priv->tx_desc[tx_head];
+	dma_addr_t phys;
+
+	hip04_tx_reclaim(ndev, false);
+	mod_timer(&priv->txtimer, jiffies + RECLAIM_PERIOD);
+
+	if (priv->tx_count >= TX_DESC_NUM) {
+		netif_stop_queue(ndev);
+		return NETDEV_TX_BUSY;
+	}
+
+	phys = dma_map_single(&ndev->dev, skb->data, skb->len, DMA_TO_DEVICE);
+	if (dma_mapping_error(&ndev->dev, phys)) {
+		dev_kfree_skb(skb);
+		return NETDEV_TX_OK;
+	}
+
+	priv->tx_skb[tx_head] = skb;
+	priv->tx_phys[tx_head] = phys;
+	desc->send_addr = cpu_to_be32(phys);
+	desc->send_size = cpu_to_be16(skb->len);
+	desc->cfg = cpu_to_be32(DESC_DEF_CFG);
+	phys = priv->tx_desc_dma + tx_head * sizeof(struct tx_desc);
+	desc->wb_addr = cpu_to_be32(phys);
+	skb_tx_timestamp(skb);
+
+	/* Don't wait up for transmitted skbs to be freed. */
+	skb_orphan(skb);
+
+	hip04_set_xmit_desc(priv, phys);
+	priv->tx_head = TX_NEXT(tx_head);
+	netdev_sent_queue(ndev, skb->len);
+
+	stats->tx_bytes += skb->len;
+	stats->tx_packets++;
+	priv->tx_count++;
+
+	return NETDEV_TX_OK;
+}
+
+static int hip04_rx_poll(struct napi_struct *napi, int budget)
+{
+	struct hip04_priv *priv = container_of(napi, struct hip04_priv, napi);
+	struct net_device *ndev = priv->ndev;
+	struct net_device_stats *stats = &ndev->stats;
+	unsigned int cnt = hip04_recv_cnt(priv);
+	struct rx_desc *desc;
+	struct sk_buff *skb;
+	unsigned char *buf;
+	bool last = false;
+	dma_addr_t phys;
+	int rx = 0;
+	u16 len;
+	u32 err;
+
+	while (cnt && !last) {
+		buf = priv->rx_buf[priv->rx_head];
+		skb = build_skb(buf, priv->rx_buf_size);
+		if (unlikely(!skb))
+			net_dbg_ratelimited("build_skb failed\n");
+
+		dma_unmap_single(&ndev->dev, priv->rx_phys[priv->rx_head],
+				RX_BUF_SIZE, DMA_FROM_DEVICE);
+		priv->rx_phys[priv->rx_head] = 0;
+
+		desc = (struct rx_desc *)skb->data;
+		len = be16_to_cpu(desc->pkt_len);
+		err = be32_to_cpu(desc->pkt_err);
+
+		if (len > RX_BUF_SIZE)
+			len = RX_BUF_SIZE;
+
+		if (0 == len) {
+			dev_kfree_skb_any(skb);
+			last = true;
+		} else if (err & RX_PKT_ERR) {
+			dev_kfree_skb_any(skb);
+			stats->rx_dropped++;
+			stats->rx_errors++;
+		} else {
+			skb_reserve(skb, NET_SKB_PAD + NET_IP_ALIGN);
+			skb_put(skb, len);
+			skb->protocol = eth_type_trans(skb, ndev);
+			napi_gro_receive(&priv->napi, skb);
+			stats->rx_packets++;
+			stats->rx_bytes += len;
+			rx++;
+		}
+
+		buf = netdev_alloc_frag(priv->rx_buf_size);
+		if (!buf)
+			return -ENOMEM;
+		phys = dma_map_single(&ndev->dev, buf,
+				RX_BUF_SIZE, DMA_FROM_DEVICE);
+		if (dma_mapping_error(&ndev->dev, phys))
+			return -EIO;
+		priv->rx_buf[priv->rx_head] = buf;
+		priv->rx_phys[priv->rx_head] = phys;
+		hip04_set_recv_desc(priv, phys);
+
+		priv->rx_head = RX_NEXT(priv->rx_head);
+		if (rx >= budget)
+			break;
+
+		if (--cnt == 0)
+			cnt = hip04_recv_cnt(priv);
+	}
+
+	if (rx < budget) {
+		napi_complete(napi);
+
+		/* enable rx interrupt */
+		priv->reg_inten |= RCV_INT | RCV_NOBUF;
+		writel_relaxed(priv->reg_inten, priv->base + PPE_INTEN);
+	}
+
+	return rx;
+}
+
+static irqreturn_t hip04_mac_interrupt(int irq, void *dev_id)
+{
+	struct net_device *ndev = (struct net_device *) dev_id;
+	struct hip04_priv *priv = netdev_priv(ndev);
+	u32 ists = readl_relaxed(priv->base + PPE_INTSTS);
+
+	writel_relaxed(DEF_INT_MASK, priv->base + PPE_RINT);
+
+	if (ists & (RCV_INT | RCV_NOBUF)) {
+		/* disable rx interrupt */
+		priv->reg_inten &= ~(RCV_INT | RCV_NOBUF);
+		writel_relaxed(priv->reg_inten, priv->base + PPE_INTEN);
+		napi_schedule(&priv->napi);
+	}
+
+	return IRQ_HANDLED;
+}
+
+static void hip04_adjust_link(struct net_device *ndev)
+{
+	struct hip04_priv *priv = netdev_priv(ndev);
+	struct phy_device *phy = priv->phy;
+
+	if ((priv->speed != phy->speed) || (priv->duplex != phy->duplex)) {
+		hip04_config_port(ndev, phy->speed, phy->duplex);
+		phy_print_status(phy);
+	}
+}
+
+static int hip04_mac_open(struct net_device *ndev)
+{
+	struct hip04_priv *priv = netdev_priv(ndev);
+	int i;
+
+	priv->rx_head = 0;
+	priv->tx_head = 0;
+	priv->tx_tail = 0;
+	priv->tx_count = 0;
+
+	hip04_reset_ppe(priv);
+	for (i = 0; i < RX_DESC_NUM; i++) {
+		dma_addr_t phys;
+
+		phys = dma_map_single(&ndev->dev, priv->rx_buf[i],
+				RX_BUF_SIZE, DMA_FROM_DEVICE);
+		if (dma_mapping_error(&ndev->dev, phys))
+			return -EIO;
+
+		priv->rx_phys[i] = phys;
+		hip04_set_recv_desc(priv, phys);
+	}
+
+	if (priv->phy)
+		phy_start(priv->phy);
+
+	netif_start_queue(ndev);
+	hip04_mac_enable(ndev);
+	napi_enable(&priv->napi);
+
+	return 0;
+}
+
+static int hip04_mac_stop(struct net_device *ndev)
+{
+	struct hip04_priv *priv = netdev_priv(ndev);
+	int i;
+
+	napi_disable(&priv->napi);
+	netif_stop_queue(ndev);
+	hip04_mac_disable(ndev);
+	hip04_tx_reclaim(ndev, true);
+	hip04_reset_ppe(priv);
+
+	if (priv->phy)
+		phy_stop(priv->phy);
+
+	for (i = 0; i < RX_DESC_NUM; i++) {
+		if (priv->rx_phys[i]) {
+			dma_unmap_single(&ndev->dev, priv->rx_phys[i],
+					RX_BUF_SIZE, DMA_FROM_DEVICE);
+			priv->rx_phys[i] = 0;
+		}
+	}
+
+	return 0;
+}
+
+static void hip04_timeout(struct net_device *ndev)
+{
+	hip04_mac_stop(ndev);
+	hip04_mac_open(ndev);
+	return;
+}
+
+static struct net_device_ops hip04_netdev_ops = {
+	.ndo_open		= hip04_mac_open,
+	.ndo_stop		= hip04_mac_stop,
+	.ndo_start_xmit		= hip04_mac_start_xmit,
+	.ndo_set_mac_address	= hip04_set_mac_address,
+	.ndo_tx_timeout         = hip04_timeout,
+	.ndo_validate_addr	= eth_validate_addr,
+	.ndo_change_mtu		= eth_change_mtu,
+};
+
+static int hip04_alloc_ring(struct net_device *ndev, struct device *d)
+{
+	struct hip04_priv *priv = netdev_priv(ndev);
+	int i;
+
+	priv->tx_desc = dma_alloc_coherent(d,
+			TX_DESC_NUM * sizeof(struct tx_desc),
+			&priv->tx_desc_dma, GFP_KERNEL);
+	if (!priv->tx_desc)
+		return -ENOMEM;
+
+	priv->rx_buf_size = RX_BUF_SIZE +
+			    SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
+	for (i = 0; i < RX_DESC_NUM; i++) {
+		priv->rx_buf[i] = netdev_alloc_frag(priv->rx_buf_size);
+		if (!priv->rx_buf[i])
+			return -ENOMEM;
+	}
+
+	return 0;
+}
+
+static void hip04_free_ring(struct net_device *ndev, struct device *d)
+{
+	struct hip04_priv *priv = netdev_priv(ndev);
+	int i;
+
+	for (i = 0; i < RX_DESC_NUM; i++)
+		if (priv->rx_buf[i])
+			put_page(virt_to_head_page(priv->rx_buf[i]));
+
+	for (i = 0; i < TX_DESC_NUM; i++)
+		if (priv->tx_skb[i])
+			dev_kfree_skb_any(priv->tx_skb[i]);
+
+	dma_free_coherent(d, TX_DESC_NUM * sizeof(struct tx_desc),
+			priv->tx_desc, priv->tx_desc_dma);
+}
+
+static int hip04_mac_probe(struct platform_device *pdev)
+{
+	struct device *d = &pdev->dev;
+	struct device_node *node = d->of_node;
+	struct of_phandle_args arg;
+	struct net_device *ndev;
+	struct hip04_priv *priv;
+	struct resource *res;
+	unsigned int irq;
+	int ret;
+
+	ndev = alloc_etherdev(sizeof(struct hip04_priv));
+	if (!ndev)
+		return -ENOMEM;
+
+	priv = netdev_priv(ndev);
+	priv->ndev = ndev;
+	platform_set_drvdata(pdev, ndev);
+
+	res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
+	priv->base = devm_ioremap_resource(d, res);
+	if (IS_ERR(priv->base)) {
+		ret = PTR_ERR(priv->base);
+		goto init_fail;
+	}
+
+	ret = of_parse_phandle_with_fixed_args(node, "port-handle", 2, 0, &arg);
+	if (ret < 0) {
+		dev_warn(d, "no port-handle\n");
+		goto init_fail;
+	}
+
+	priv->port = arg.args[0];
+	priv->chan = arg.args[1];
+
+	priv->map = syscon_node_to_regmap(arg.np);
+	if (IS_ERR(priv->map)) {
+		dev_warn(d, "no syscon hisilicon,hip04-ppe\n");
+		ret = PTR_ERR(priv->map);
+		goto init_fail;
+	}
+
+	priv->phy_mode = of_get_phy_mode(node);
+	if (priv->phy_mode < 0) {
+		dev_warn(d, "not find phy-mode\n");
+		ret = -EINVAL;
+		goto init_fail;
+	}
+
+	irq = platform_get_irq(pdev, 0);
+	if (irq <= 0) {
+		ret = -EINVAL;
+		goto init_fail;
+	}
+
+	ret = devm_request_irq(d, irq, hip04_mac_interrupt,
+				0, pdev->name, ndev);
+	if (ret) {
+		netdev_err(ndev, "devm_request_irq failed\n");
+		goto init_fail;
+	}
+
+	priv->phy_node = of_parse_phandle(node, "phy-handle", 0);
+	if (priv->phy_node) {
+		priv->phy = of_phy_connect(ndev, priv->phy_node,
+			&hip04_adjust_link, 0, priv->phy_mode);
+		if (!priv->phy) {
+			ret = -EPROBE_DEFER;
+			goto init_fail;
+		}
+	}
+
+	ether_setup(ndev);
+	ndev->netdev_ops = &hip04_netdev_ops;
+	ndev->watchdog_timeo = TX_TIMEOUT;
+	ndev->priv_flags |= IFF_UNICAST_FLT;
+	ndev->irq = irq;
+	netif_napi_add(ndev, &priv->napi, hip04_rx_poll, RX_DESC_NUM);
+	SET_NETDEV_DEV(ndev, &pdev->dev);
+
+	hip04_reset_ppe(priv);
+	if (priv->phy_mode == PHY_INTERFACE_MODE_MII)
+		hip04_config_port(ndev, SPEED_100, DUPLEX_FULL);
+
+	hip04_config_fifo(priv);
+	random_ether_addr(ndev->dev_addr);
+	hip04_update_mac_address(ndev);
+
+	ret = hip04_alloc_ring(ndev, d);
+	if (ret) {
+		netdev_err(ndev, "alloc ring fail\n");
+		goto alloc_fail;
+	}
+
+	setup_timer(&priv->txtimer, hip04_xmit_timer, (unsigned long) ndev);
+	ret = register_netdev(ndev);
+	if (ret) {
+		free_netdev(ndev);
+		goto alloc_fail;
+	}
+
+	return 0;
+
+alloc_fail:
+	hip04_free_ring(ndev, d);
+init_fail:
+	of_node_put(priv->phy_node);
+	free_netdev(ndev);
+	return ret;
+}
+
+static int hip04_remove(struct platform_device *pdev)
+{
+	struct net_device *ndev = platform_get_drvdata(pdev);
+	struct hip04_priv *priv = netdev_priv(ndev);
+	struct device *d = &pdev->dev;
+
+	if (priv->phy)
+		phy_disconnect(priv->phy);
+
+	del_timer_sync(&priv->txtimer);
+	hip04_free_ring(ndev, d);
+	unregister_netdev(ndev);
+	free_irq(ndev->irq, ndev);
+	of_node_put(priv->phy_node);
+	free_netdev(ndev);
+
+	return 0;
+}
+
+static const struct of_device_id hip04_mac_match[] = {
+	{ .compatible = "hisilicon,hip04-mac" },
+	{ }
+};
+
+static struct platform_driver hip04_mac_driver = {
+	.probe	= hip04_mac_probe,
+	.remove	= hip04_remove,
+	.driver	= {
+		.name		= DRV_NAME,
+		.owner		= THIS_MODULE,
+		.of_match_table	= hip04_mac_match,
+	},
+};
+module_platform_driver(hip04_mac_driver);
+
+MODULE_DESCRIPTION("HISILICON P04 Ethernet driver");
+MODULE_LICENSE("GPL v2");
+MODULE_ALIAS("platform:hip04-ether");
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 148+ messages in thread

* [PATCH 3/3] net: hisilicon: new hip04 ethernet driver
  2014-04-04 15:16 [PATCH v6 0/3] add hisilicon " Zhangfei Gao
@ 2014-04-04 15:16     ` Zhangfei Gao
  0 siblings, 0 replies; 148+ messages in thread
From: Zhangfei Gao @ 2014-04-04 15:16 UTC (permalink / raw)
  To: davem-fT/PcQaiUtIeIZ0/mPfg9Q, linux-lFZ/pmaqli7XmaaqVzeoHQ,
	arnd-r2nGTMty4D4, f.fainelli-Re5JQEeQqe8AvxtiuMwx3w,
	sergei.shtylyov-M4DtvfQ/ZS1MRgGoP+s0PdBPR1lH4CV8,
	mark.rutland-5wv7dgnIgG8, David.Laight-ZS65k/vG3HxXrIkS9f7CXA,
	eric.dumazet-Re5JQEeQqe8AvxtiuMwx3w
  Cc: linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	netdev-u79uwXL29TY76Z2rM5mHXA, devicetree-u79uwXL29TY76Z2rM5mHXA,
	Zhangfei Gao

Support Hisilicon hip04 ethernet driver, including 100M / 1000M controller

Signed-off-by: Zhangfei Gao <zhangfei.gao-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org>
---
 drivers/net/ethernet/hisilicon/Makefile    |    2 +-
 drivers/net/ethernet/hisilicon/hip04_eth.c |  777 ++++++++++++++++++++++++++++
 2 files changed, 778 insertions(+), 1 deletion(-)
 create mode 100644 drivers/net/ethernet/hisilicon/hip04_eth.c

diff --git a/drivers/net/ethernet/hisilicon/Makefile b/drivers/net/ethernet/hisilicon/Makefile
index 1d6eb6e..17dec03 100644
--- a/drivers/net/ethernet/hisilicon/Makefile
+++ b/drivers/net/ethernet/hisilicon/Makefile
@@ -2,4 +2,4 @@
 # Makefile for the HISILICON network device drivers.
 #
 
-obj-$(CONFIG_HIP04_ETH) += hip04_mdio.o
+obj-$(CONFIG_HIP04_ETH) += hip04_mdio.o hip04_eth.o
diff --git a/drivers/net/ethernet/hisilicon/hip04_eth.c b/drivers/net/ethernet/hisilicon/hip04_eth.c
new file mode 100644
index 0000000..b5b8b2f
--- /dev/null
+++ b/drivers/net/ethernet/hisilicon/hip04_eth.c
@@ -0,0 +1,777 @@
+
+/* Copyright (c) 2014 Linaro Ltd.
+ * Copyright (c) 2014 Hisilicon Limited.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ */
+
+#include <linux/module.h>
+#include <linux/etherdevice.h>
+#include <linux/platform_device.h>
+#include <linux/interrupt.h>
+#include <linux/of_address.h>
+#include <linux/phy.h>
+#include <linux/of_mdio.h>
+#include <linux/of_net.h>
+#include <linux/mfd/syscon.h>
+#include <linux/regmap.h>
+
+#define PPE_CFG_RX_ADDR			0x100
+#define PPE_CFG_POOL_GRP		0x300
+#define PPE_CFG_RX_BUF_SIZE		0x400
+#define PPE_CFG_RX_FIFO_SIZE		0x500
+#define PPE_CURR_BUF_CNT		0xa200
+
+#define GE_DUPLEX_TYPE			0x8
+#define GE_MAX_FRM_SIZE_REG		0x3c
+#define GE_PORT_MODE			0x40
+#define GE_PORT_EN			0x44
+#define GE_SHORT_RUNTS_THR_REG		0x50
+#define GE_TX_LOCAL_PAGE_REG		0x5c
+#define GE_TRANSMIT_CONTROL_REG		0x60
+#define GE_CF_CRC_STRIP_REG		0x1b0
+#define GE_MODE_CHANGE_EN		0x1b4
+#define GE_RECV_CONTROL_REG		0x1e0
+#define GE_STATION_MAC_ADDRESS		0x210
+#define PPE_CFG_TX_PKT_BD_ADDR		0x420
+#define PPE_CFG_MAX_FRAME_LEN_REG	0x408
+#define PPE_CFG_BUS_CTRL_REG		0x424
+#define PPE_CFG_RX_CTRL_REG		0x428
+#define PPE_CFG_RX_PKT_MODE_REG		0x438
+#define PPE_CFG_QOS_VMID_GEN		0x500
+#define PPE_CFG_RX_PKT_INT		0x538
+#define PPE_INTEN			0x600
+#define PPE_INTSTS			0x608
+#define PPE_RINT			0x604
+#define PPE_CFG_STS_MODE		0x700
+#define PPE_HIS_RX_PKT_CNT		0x804
+
+/* REG_INTERRUPT */
+#define RCV_INT				BIT(10)
+#define RCV_NOBUF			BIT(8)
+#define DEF_INT_MASK			(RCV_INT | RCV_NOBUF)
+
+#define RX_DESC_NUM			64
+#define TX_DESC_NUM			64
+#define TX_NEXT(N)			(((N) + 1) & (TX_DESC_NUM-1))
+#define RX_NEXT(N)			(((N) + 1) & (RX_DESC_NUM-1))
+
+#define GMAC_PPE_RX_PKT_MAX_LEN		379
+#define GMAC_MAX_PKT_LEN		1516
+#define DESC_DEF_CFG			0x14
+#define RX_BUF_SIZE			1600
+#define RX_PKT_ERR			0x3
+#define TX_TIMEOUT			(6 * HZ)
+#define RECLAIM_PERIOD			HZ/15
+
+#define DRV_NAME			"hip04-ether"
+
+struct tx_desc {
+	u32 send_addr;
+	u16 reserved_16;
+	u16 send_size;
+	u32 reserved_32;
+	u32 cfg;
+	u32 wb_addr;
+} ____cacheline_aligned;
+
+struct rx_desc {
+	u16 reserved_16;
+	u16 pkt_len;
+	u32 reserve1[3];
+	u32 pkt_err;
+	u32 reserve2[4];
+};
+
+struct hip04_priv {
+	void __iomem *base;
+	int phy_mode;
+	int chan;
+	unsigned int port;
+	unsigned int speed;
+	unsigned int duplex;
+	unsigned int reg_inten;
+
+	struct napi_struct napi;
+	struct net_device *ndev;
+
+	struct tx_desc *tx_desc;
+	dma_addr_t tx_desc_dma;
+	struct sk_buff *tx_skb[TX_DESC_NUM];
+	dma_addr_t tx_phys[TX_DESC_NUM];
+	spinlock_t lock;
+	unsigned int tx_head;
+	unsigned int tx_tail;
+	unsigned int tx_count;
+
+	unsigned char *rx_buf[RX_DESC_NUM];
+	dma_addr_t rx_phys[RX_DESC_NUM];
+	unsigned int rx_head;
+	unsigned int rx_buf_size;
+
+	struct device_node *phy_node;
+	struct phy_device *phy;
+	struct regmap *map;
+	struct timer_list txtimer;
+};
+
+static void hip04_config_port(struct net_device *ndev, u32 speed, u32 duplex)
+{
+	struct hip04_priv *priv = netdev_priv(ndev);
+	u32 val;
+
+	priv->speed = speed;
+	priv->duplex = duplex;
+
+	switch (priv->phy_mode) {
+	case PHY_INTERFACE_MODE_SGMII:
+		if (speed == SPEED_1000)
+			val = 8;
+		else if (speed == SPEED_100)
+			val = 7;
+		else
+			val = 6;	/* SPEED_10 */
+		break;
+	case PHY_INTERFACE_MODE_MII:
+		if (speed == SPEED_100)
+			val = 1;
+		else
+			val = 0;	/* SPEED_10 */
+		break;
+	default:
+		netdev_warn(ndev, "not supported mode\n");
+		val = 0;
+		break;
+	}
+	writel_relaxed(val, priv->base + GE_PORT_MODE);
+
+	val = (duplex) ? BIT(0) : 0;
+	writel_relaxed(val, priv->base + GE_DUPLEX_TYPE);
+
+	val = BIT(0);
+	writel_relaxed(val, priv->base + GE_MODE_CHANGE_EN);
+}
+
+static void hip04_reset_ppe(struct hip04_priv *priv)
+{
+	u32 val, tmp;
+
+	do {
+		regmap_read(priv->map, priv->port * 4 + PPE_CURR_BUF_CNT, &val);
+		regmap_read(priv->map, priv->port * 4 + PPE_CFG_RX_ADDR, &tmp);
+	} while (val & 0xfff);
+}
+
+static void hip04_config_fifo(struct hip04_priv *priv)
+{
+	u32 val;
+
+	val = readl_relaxed(priv->base + PPE_CFG_STS_MODE);
+	val |= BIT(12);			/* PPE_HIS_RX_PKT_CNT read clear */
+	writel_relaxed(val, priv->base + PPE_CFG_STS_MODE);
+
+	val = BIT(priv->port);
+	regmap_write(priv->map, priv->port * 4 + PPE_CFG_POOL_GRP, val);
+
+	val = priv->port << 8;
+	val |= BIT(14);
+	writel_relaxed(val, priv->base + PPE_CFG_QOS_VMID_GEN);
+
+	val = RX_BUF_SIZE;
+	regmap_write(priv->map, priv->port * 4 + PPE_CFG_RX_BUF_SIZE, val);
+
+	val = RX_DESC_NUM << 16;	/* depth */
+	val |= BIT(11);			/* seq: first set first ues */
+	val |= RX_DESC_NUM * priv->chan;	/* start_addr */
+	regmap_write(priv->map, priv->port * 4 + PPE_CFG_RX_FIFO_SIZE, val);
+
+	/* pkt store format */
+	val = NET_IP_ALIGN << 11;	/* align */
+	writel_relaxed(val, priv->base + PPE_CFG_RX_CTRL_REG);
+
+	/* following cfg required for 1000M */
+	/* pkt mode */
+	val = BIT(18);			/* align */
+	writel_relaxed(val, priv->base + PPE_CFG_RX_PKT_MODE_REG);
+
+	/* set bus ctrl */
+	val = BIT(14);			/* buffer locally release */
+	val |= BIT(0);			/* big endian */
+	writel_relaxed(val, priv->base + PPE_CFG_BUS_CTRL_REG);
+
+	/* set max pkt len, curtail if exceed */
+	val = GMAC_PPE_RX_PKT_MAX_LEN;	/* max buffer len */
+	writel_relaxed(val, priv->base + PPE_CFG_MAX_FRAME_LEN_REG);
+
+	/* set max len of each pkt */
+	val = GMAC_MAX_PKT_LEN;		/* max buffer len */
+	writel_relaxed(val, priv->base + GE_MAX_FRM_SIZE_REG);
+
+	/* set min len of each pkt */
+	val = 31;			/* min buffer len */
+	writel_relaxed(val, priv->base + GE_SHORT_RUNTS_THR_REG);
+
+	/* tx */
+	val = readl_relaxed(priv->base + GE_TRANSMIT_CONTROL_REG);
+	val |= BIT(5);			/* tx auto neg */
+	val |= BIT(6);			/* tx add crc */
+	val |= BIT(7);			/* tx short pad through */
+	writel_relaxed(val, priv->base + GE_TRANSMIT_CONTROL_REG);
+
+	/* rx crc */
+	val = BIT(0);			/* rx strip crc */
+	writel_relaxed(val, priv->base + GE_CF_CRC_STRIP_REG);
+
+	/* rx */
+	val = readl_relaxed(priv->base + GE_RECV_CONTROL_REG);
+	val |= BIT(3);			/* rx strip pad */
+	val |= BIT(4);			/* run pkt en */
+	writel_relaxed(val, priv->base + GE_RECV_CONTROL_REG);
+
+	/* auto neg control */
+	val = BIT(0);
+	writel_relaxed(val, priv->base + GE_TX_LOCAL_PAGE_REG);
+}
+
+static void hip04_mac_enable(struct net_device *ndev)
+{
+	struct hip04_priv *priv = netdev_priv(ndev);
+	u32 val;
+
+	/* enable tx & rx */
+	val = readl_relaxed(priv->base + GE_PORT_EN);
+	val |= BIT(1);		/* rx */
+	val |= BIT(2);		/* tx */
+	writel_relaxed(val, priv->base + GE_PORT_EN);
+
+	/* clear rx int */
+	val = RCV_INT;
+	writel_relaxed(val, priv->base + PPE_RINT);
+
+	/* config recv int */
+	val = BIT(6);		/* int threshold 1 package */
+	val |= 0x4;		/* recv timeout */
+	writel_relaxed(val, priv->base + PPE_CFG_RX_PKT_INT);
+
+	/* enable interrupt */
+	priv->reg_inten = DEF_INT_MASK;
+	writel_relaxed(priv->reg_inten, priv->base + PPE_INTEN);
+}
+
+static void hip04_mac_disable(struct net_device *ndev)
+{
+	struct hip04_priv *priv = netdev_priv(ndev);
+	u32 val;
+
+	/* disable int */
+	priv->reg_inten &= ~(RCV_INT | RCV_NOBUF);
+	writel_relaxed(priv->reg_inten, priv->base + PPE_INTEN);
+
+	/* disable tx & rx */
+	val = readl_relaxed(priv->base + GE_PORT_EN);
+	val &= ~(BIT(1));	/* rx */
+	val &= ~(BIT(2));	/* tx */
+	writel_relaxed(val, priv->base + GE_PORT_EN);
+}
+
+static void hip04_set_xmit_desc(struct hip04_priv *priv, dma_addr_t phys)
+{
+	writel(phys, priv->base + PPE_CFG_TX_PKT_BD_ADDR);
+}
+
+static void hip04_set_recv_desc(struct hip04_priv *priv, dma_addr_t phys)
+{
+	regmap_write(priv->map, priv->port * 4 + PPE_CFG_RX_ADDR, phys);
+}
+
+static u32 hip04_recv_cnt(struct hip04_priv *priv)
+{
+	return readl(priv->base + PPE_HIS_RX_PKT_CNT);
+}
+
+static void hip04_update_mac_address(struct net_device *ndev)
+{
+	struct hip04_priv *priv = netdev_priv(ndev);
+
+	writel_relaxed(((ndev->dev_addr[0] << 8) | (ndev->dev_addr[1])),
+			priv->base + GE_STATION_MAC_ADDRESS);
+	writel_relaxed(((ndev->dev_addr[2] << 24) | (ndev->dev_addr[3] << 16) |
+			(ndev->dev_addr[4] << 8) | (ndev->dev_addr[5])),
+			priv->base + GE_STATION_MAC_ADDRESS + 4);
+}
+
+static int hip04_set_mac_address(struct net_device *ndev, void *addr)
+{
+	eth_mac_addr(ndev, addr);
+	hip04_update_mac_address(ndev);
+	return 0;
+}
+
+static void hip04_tx_reclaim(struct net_device *ndev, bool force)
+{
+	struct hip04_priv *priv = netdev_priv(ndev);
+	unsigned tx_head = priv->tx_head;
+	unsigned tx_tail = priv->tx_tail;
+	struct tx_desc *desc;
+	unsigned int bytes_compl = 0, pkts_compl = 0;
+
+	spin_lock_bh(&priv->lock);
+	while ((tx_tail != tx_head) || (priv->tx_count == TX_DESC_NUM)) {
+		desc = &priv->tx_desc[priv->tx_tail];
+		if (desc->send_addr != 0) {
+			if (force)
+				desc->send_addr = 0;
+			else
+				break;
+		}
+		if (priv->tx_phys[tx_tail]) {
+			dma_unmap_single(&ndev->dev, priv->tx_phys[tx_tail],
+				priv->tx_skb[tx_tail]->len, DMA_TO_DEVICE);
+			priv->tx_phys[tx_tail] = 0;
+		}
+		pkts_compl++;
+		bytes_compl += priv->tx_skb[tx_tail]->len;
+		dev_kfree_skb(priv->tx_skb[tx_tail]);
+		priv->tx_skb[tx_tail] = NULL;
+		tx_tail = TX_NEXT(tx_tail);
+		priv->tx_count--;
+	}
+	priv->tx_tail = tx_tail;
+	spin_unlock_bh(&priv->lock);
+
+	netdev_completed_queue(ndev, pkts_compl, bytes_compl);
+
+	if (priv->tx_count)
+		mod_timer(&priv->txtimer, jiffies + RECLAIM_PERIOD);
+
+	if (unlikely(netif_queue_stopped(ndev)) &&
+		(priv->tx_count < TX_DESC_NUM))
+		netif_wake_queue(ndev);
+}
+
+static void hip04_xmit_timer(unsigned long data)
+{
+	struct net_device *ndev = (void *) data;
+
+	hip04_tx_reclaim(ndev, false);
+}
+
+static int hip04_mac_start_xmit(struct sk_buff *skb, struct net_device *ndev)
+{
+	struct hip04_priv *priv = netdev_priv(ndev);
+	struct net_device_stats *stats = &ndev->stats;
+	unsigned int tx_head = priv->tx_head;
+	struct tx_desc *desc = &priv->tx_desc[tx_head];
+	dma_addr_t phys;
+
+	hip04_tx_reclaim(ndev, false);
+	mod_timer(&priv->txtimer, jiffies + RECLAIM_PERIOD);
+
+	if (priv->tx_count >= TX_DESC_NUM) {
+		netif_stop_queue(ndev);
+		return NETDEV_TX_BUSY;
+	}
+
+	phys = dma_map_single(&ndev->dev, skb->data, skb->len, DMA_TO_DEVICE);
+	if (dma_mapping_error(&ndev->dev, phys)) {
+		dev_kfree_skb(skb);
+		return NETDEV_TX_OK;
+	}
+
+	priv->tx_skb[tx_head] = skb;
+	priv->tx_phys[tx_head] = phys;
+	desc->send_addr = cpu_to_be32(phys);
+	desc->send_size = cpu_to_be16(skb->len);
+	desc->cfg = cpu_to_be32(DESC_DEF_CFG);
+	phys = priv->tx_desc_dma + tx_head * sizeof(struct tx_desc);
+	desc->wb_addr = cpu_to_be32(phys);
+	skb_tx_timestamp(skb);
+
+	/* Don't wait up for transmitted skbs to be freed. */
+	skb_orphan(skb);
+
+	hip04_set_xmit_desc(priv, phys);
+	priv->tx_head = TX_NEXT(tx_head);
+	netdev_sent_queue(ndev, skb->len);
+
+	stats->tx_bytes += skb->len;
+	stats->tx_packets++;
+	priv->tx_count++;
+
+	return NETDEV_TX_OK;
+}
+
+static int hip04_rx_poll(struct napi_struct *napi, int budget)
+{
+	struct hip04_priv *priv = container_of(napi, struct hip04_priv, napi);
+	struct net_device *ndev = priv->ndev;
+	struct net_device_stats *stats = &ndev->stats;
+	unsigned int cnt = hip04_recv_cnt(priv);
+	struct rx_desc *desc;
+	struct sk_buff *skb;
+	unsigned char *buf;
+	bool last = false;
+	dma_addr_t phys;
+	int rx = 0;
+	u16 len;
+	u32 err;
+
+	while (cnt && !last) {
+		buf = priv->rx_buf[priv->rx_head];
+		skb = build_skb(buf, priv->rx_buf_size);
+		if (unlikely(!skb))
+			net_dbg_ratelimited("build_skb failed\n");
+
+		dma_unmap_single(&ndev->dev, priv->rx_phys[priv->rx_head],
+				RX_BUF_SIZE, DMA_FROM_DEVICE);
+		priv->rx_phys[priv->rx_head] = 0;
+
+		desc = (struct rx_desc *)skb->data;
+		len = be16_to_cpu(desc->pkt_len);
+		err = be32_to_cpu(desc->pkt_err);
+
+		if (len > RX_BUF_SIZE)
+			len = RX_BUF_SIZE;
+
+		if (0 == len) {
+			dev_kfree_skb_any(skb);
+			last = true;
+		} else if (err & RX_PKT_ERR) {
+			dev_kfree_skb_any(skb);
+			stats->rx_dropped++;
+			stats->rx_errors++;
+		} else {
+			skb_reserve(skb, NET_SKB_PAD + NET_IP_ALIGN);
+			skb_put(skb, len);
+			skb->protocol = eth_type_trans(skb, ndev);
+			napi_gro_receive(&priv->napi, skb);
+			stats->rx_packets++;
+			stats->rx_bytes += len;
+			rx++;
+		}
+
+		buf = netdev_alloc_frag(priv->rx_buf_size);
+		if (!buf)
+			return -ENOMEM;
+		phys = dma_map_single(&ndev->dev, buf,
+				RX_BUF_SIZE, DMA_FROM_DEVICE);
+		if (dma_mapping_error(&ndev->dev, phys))
+			return -EIO;
+		priv->rx_buf[priv->rx_head] = buf;
+		priv->rx_phys[priv->rx_head] = phys;
+		hip04_set_recv_desc(priv, phys);
+
+		priv->rx_head = RX_NEXT(priv->rx_head);
+		if (rx >= budget)
+			break;
+
+		if (--cnt == 0)
+			cnt = hip04_recv_cnt(priv);
+	}
+
+	if (rx < budget) {
+		napi_complete(napi);
+
+		/* enable rx interrupt */
+		priv->reg_inten |= RCV_INT | RCV_NOBUF;
+		writel_relaxed(priv->reg_inten, priv->base + PPE_INTEN);
+	}
+
+	return rx;
+}
+
+static irqreturn_t hip04_mac_interrupt(int irq, void *dev_id)
+{
+	struct net_device *ndev = (struct net_device *) dev_id;
+	struct hip04_priv *priv = netdev_priv(ndev);
+	u32 ists = readl_relaxed(priv->base + PPE_INTSTS);
+
+	writel_relaxed(DEF_INT_MASK, priv->base + PPE_RINT);
+
+	if (ists & (RCV_INT | RCV_NOBUF)) {
+		/* disable rx interrupt */
+		priv->reg_inten &= ~(RCV_INT | RCV_NOBUF);
+		writel_relaxed(priv->reg_inten, priv->base + PPE_INTEN);
+		napi_schedule(&priv->napi);
+	}
+
+	return IRQ_HANDLED;
+}
+
+static void hip04_adjust_link(struct net_device *ndev)
+{
+	struct hip04_priv *priv = netdev_priv(ndev);
+	struct phy_device *phy = priv->phy;
+
+	if ((priv->speed != phy->speed) || (priv->duplex != phy->duplex)) {
+		hip04_config_port(ndev, phy->speed, phy->duplex);
+		phy_print_status(phy);
+	}
+}
+
+static int hip04_mac_open(struct net_device *ndev)
+{
+	struct hip04_priv *priv = netdev_priv(ndev);
+	int i;
+
+	priv->rx_head = 0;
+	priv->tx_head = 0;
+	priv->tx_tail = 0;
+	priv->tx_count = 0;
+
+	hip04_reset_ppe(priv);
+	for (i = 0; i < RX_DESC_NUM; i++) {
+		dma_addr_t phys;
+
+		phys = dma_map_single(&ndev->dev, priv->rx_buf[i],
+				RX_BUF_SIZE, DMA_FROM_DEVICE);
+		if (dma_mapping_error(&ndev->dev, phys))
+			return -EIO;
+
+		priv->rx_phys[i] = phys;
+		hip04_set_recv_desc(priv, phys);
+	}
+
+	if (priv->phy)
+		phy_start(priv->phy);
+
+	netif_start_queue(ndev);
+	hip04_mac_enable(ndev);
+	napi_enable(&priv->napi);
+
+	return 0;
+}
+
+static int hip04_mac_stop(struct net_device *ndev)
+{
+	struct hip04_priv *priv = netdev_priv(ndev);
+	int i;
+
+	napi_disable(&priv->napi);
+	netif_stop_queue(ndev);
+	hip04_mac_disable(ndev);
+	hip04_tx_reclaim(ndev, true);
+	hip04_reset_ppe(priv);
+
+	if (priv->phy)
+		phy_stop(priv->phy);
+
+	for (i = 0; i < RX_DESC_NUM; i++) {
+		if (priv->rx_phys[i]) {
+			dma_unmap_single(&ndev->dev, priv->rx_phys[i],
+					RX_BUF_SIZE, DMA_FROM_DEVICE);
+			priv->rx_phys[i] = 0;
+		}
+	}
+
+	return 0;
+}
+
+static void hip04_timeout(struct net_device *ndev)
+{
+	hip04_mac_stop(ndev);
+	hip04_mac_open(ndev);
+	return;
+}
+
+static struct net_device_ops hip04_netdev_ops = {
+	.ndo_open		= hip04_mac_open,
+	.ndo_stop		= hip04_mac_stop,
+	.ndo_start_xmit		= hip04_mac_start_xmit,
+	.ndo_set_mac_address	= hip04_set_mac_address,
+	.ndo_tx_timeout         = hip04_timeout,
+	.ndo_validate_addr	= eth_validate_addr,
+	.ndo_change_mtu		= eth_change_mtu,
+};
+
+static int hip04_alloc_ring(struct net_device *ndev, struct device *d)
+{
+	struct hip04_priv *priv = netdev_priv(ndev);
+	int i;
+
+	priv->tx_desc = dma_alloc_coherent(d,
+			TX_DESC_NUM * sizeof(struct tx_desc),
+			&priv->tx_desc_dma, GFP_KERNEL);
+	if (!priv->tx_desc)
+		return -ENOMEM;
+
+	priv->rx_buf_size = RX_BUF_SIZE +
+			    SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
+	for (i = 0; i < RX_DESC_NUM; i++) {
+		priv->rx_buf[i] = netdev_alloc_frag(priv->rx_buf_size);
+		if (!priv->rx_buf[i])
+			return -ENOMEM;
+	}
+
+	return 0;
+}
+
+static void hip04_free_ring(struct net_device *ndev, struct device *d)
+{
+	struct hip04_priv *priv = netdev_priv(ndev);
+	int i;
+
+	for (i = 0; i < RX_DESC_NUM; i++)
+		if (priv->rx_buf[i])
+			put_page(virt_to_head_page(priv->rx_buf[i]));
+
+	for (i = 0; i < TX_DESC_NUM; i++)
+		if (priv->tx_skb[i])
+			dev_kfree_skb_any(priv->tx_skb[i]);
+
+	dma_free_coherent(d, TX_DESC_NUM * sizeof(struct tx_desc),
+			priv->tx_desc, priv->tx_desc_dma);
+}
+
+static int hip04_mac_probe(struct platform_device *pdev)
+{
+	struct device *d = &pdev->dev;
+	struct device_node *node = d->of_node;
+	struct of_phandle_args arg;
+	struct net_device *ndev;
+	struct hip04_priv *priv;
+	struct resource *res;
+	unsigned int irq;
+	int ret;
+
+	ndev = alloc_etherdev(sizeof(struct hip04_priv));
+	if (!ndev)
+		return -ENOMEM;
+
+	priv = netdev_priv(ndev);
+	priv->ndev = ndev;
+	platform_set_drvdata(pdev, ndev);
+
+	res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
+	priv->base = devm_ioremap_resource(d, res);
+	if (IS_ERR(priv->base)) {
+		ret = PTR_ERR(priv->base);
+		goto init_fail;
+	}
+
+	ret = of_parse_phandle_with_fixed_args(node, "port-handle", 2, 0, &arg);
+	if (ret < 0) {
+		dev_warn(d, "no port-handle\n");
+		goto init_fail;
+	}
+
+	priv->port = arg.args[0];
+	priv->chan = arg.args[1];
+
+	priv->map = syscon_node_to_regmap(arg.np);
+	if (IS_ERR(priv->map)) {
+		dev_warn(d, "no syscon hisilicon,hip04-ppe\n");
+		ret = PTR_ERR(priv->map);
+		goto init_fail;
+	}
+
+	priv->phy_mode = of_get_phy_mode(node);
+	if (priv->phy_mode < 0) {
+		dev_warn(d, "not find phy-mode\n");
+		ret = -EINVAL;
+		goto init_fail;
+	}
+
+	irq = platform_get_irq(pdev, 0);
+	if (irq <= 0) {
+		ret = -EINVAL;
+		goto init_fail;
+	}
+
+	ret = devm_request_irq(d, irq, hip04_mac_interrupt,
+				0, pdev->name, ndev);
+	if (ret) {
+		netdev_err(ndev, "devm_request_irq failed\n");
+		goto init_fail;
+	}
+
+	priv->phy_node = of_parse_phandle(node, "phy-handle", 0);
+	if (priv->phy_node) {
+		priv->phy = of_phy_connect(ndev, priv->phy_node,
+			&hip04_adjust_link, 0, priv->phy_mode);
+		if (!priv->phy) {
+			ret = -EPROBE_DEFER;
+			goto init_fail;
+		}
+	}
+
+	ether_setup(ndev);
+	ndev->netdev_ops = &hip04_netdev_ops;
+	ndev->watchdog_timeo = TX_TIMEOUT;
+	ndev->priv_flags |= IFF_UNICAST_FLT;
+	ndev->irq = irq;
+	netif_napi_add(ndev, &priv->napi, hip04_rx_poll, RX_DESC_NUM);
+	SET_NETDEV_DEV(ndev, &pdev->dev);
+
+	hip04_reset_ppe(priv);
+	if (priv->phy_mode == PHY_INTERFACE_MODE_MII)
+		hip04_config_port(ndev, SPEED_100, DUPLEX_FULL);
+
+	hip04_config_fifo(priv);
+	random_ether_addr(ndev->dev_addr);
+	hip04_update_mac_address(ndev);
+
+	ret = hip04_alloc_ring(ndev, d);
+	if (ret) {
+		netdev_err(ndev, "alloc ring fail\n");
+		goto alloc_fail;
+	}
+
+	setup_timer(&priv->txtimer, hip04_xmit_timer, (unsigned long) ndev);
+	ret = register_netdev(ndev);
+	if (ret) {
+		free_netdev(ndev);
+		goto alloc_fail;
+	}
+
+	return 0;
+
+alloc_fail:
+	hip04_free_ring(ndev, d);
+init_fail:
+	of_node_put(priv->phy_node);
+	free_netdev(ndev);
+	return ret;
+}
+
+static int hip04_remove(struct platform_device *pdev)
+{
+	struct net_device *ndev = platform_get_drvdata(pdev);
+	struct hip04_priv *priv = netdev_priv(ndev);
+	struct device *d = &pdev->dev;
+
+	if (priv->phy)
+		phy_disconnect(priv->phy);
+
+	del_timer_sync(&priv->txtimer);
+	hip04_free_ring(ndev, d);
+	unregister_netdev(ndev);
+	free_irq(ndev->irq, ndev);
+	of_node_put(priv->phy_node);
+	free_netdev(ndev);
+
+	return 0;
+}
+
+static const struct of_device_id hip04_mac_match[] = {
+	{ .compatible = "hisilicon,hip04-mac" },
+	{ }
+};
+
+static struct platform_driver hip04_mac_driver = {
+	.probe	= hip04_mac_probe,
+	.remove	= hip04_remove,
+	.driver	= {
+		.name		= DRV_NAME,
+		.owner		= THIS_MODULE,
+		.of_match_table	= hip04_mac_match,
+	},
+};
+module_platform_driver(hip04_mac_driver);
+
+MODULE_DESCRIPTION("HISILICON P04 Ethernet driver");
+MODULE_LICENSE("GPL v2");
+MODULE_ALIAS("platform:hip04-ether");
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 148+ messages in thread

* [PATCH 3/3] net: hisilicon: new hip04 ethernet driver
@ 2014-04-04 15:16     ` Zhangfei Gao
  0 siblings, 0 replies; 148+ messages in thread
From: Zhangfei Gao @ 2014-04-04 15:16 UTC (permalink / raw)
  To: linux-arm-kernel

Support Hisilicon hip04 ethernet driver, including 100M / 1000M controller

Signed-off-by: Zhangfei Gao <zhangfei.gao@linaro.org>
---
 drivers/net/ethernet/hisilicon/Makefile    |    2 +-
 drivers/net/ethernet/hisilicon/hip04_eth.c |  777 ++++++++++++++++++++++++++++
 2 files changed, 778 insertions(+), 1 deletion(-)
 create mode 100644 drivers/net/ethernet/hisilicon/hip04_eth.c

diff --git a/drivers/net/ethernet/hisilicon/Makefile b/drivers/net/ethernet/hisilicon/Makefile
index 1d6eb6e..17dec03 100644
--- a/drivers/net/ethernet/hisilicon/Makefile
+++ b/drivers/net/ethernet/hisilicon/Makefile
@@ -2,4 +2,4 @@
 # Makefile for the HISILICON network device drivers.
 #
 
-obj-$(CONFIG_HIP04_ETH) += hip04_mdio.o
+obj-$(CONFIG_HIP04_ETH) += hip04_mdio.o hip04_eth.o
diff --git a/drivers/net/ethernet/hisilicon/hip04_eth.c b/drivers/net/ethernet/hisilicon/hip04_eth.c
new file mode 100644
index 0000000..b5b8b2f
--- /dev/null
+++ b/drivers/net/ethernet/hisilicon/hip04_eth.c
@@ -0,0 +1,777 @@
+
+/* Copyright (c) 2014 Linaro Ltd.
+ * Copyright (c) 2014 Hisilicon Limited.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ */
+
+#include <linux/module.h>
+#include <linux/etherdevice.h>
+#include <linux/platform_device.h>
+#include <linux/interrupt.h>
+#include <linux/of_address.h>
+#include <linux/phy.h>
+#include <linux/of_mdio.h>
+#include <linux/of_net.h>
+#include <linux/mfd/syscon.h>
+#include <linux/regmap.h>
+
+#define PPE_CFG_RX_ADDR			0x100
+#define PPE_CFG_POOL_GRP		0x300
+#define PPE_CFG_RX_BUF_SIZE		0x400
+#define PPE_CFG_RX_FIFO_SIZE		0x500
+#define PPE_CURR_BUF_CNT		0xa200
+
+#define GE_DUPLEX_TYPE			0x8
+#define GE_MAX_FRM_SIZE_REG		0x3c
+#define GE_PORT_MODE			0x40
+#define GE_PORT_EN			0x44
+#define GE_SHORT_RUNTS_THR_REG		0x50
+#define GE_TX_LOCAL_PAGE_REG		0x5c
+#define GE_TRANSMIT_CONTROL_REG		0x60
+#define GE_CF_CRC_STRIP_REG		0x1b0
+#define GE_MODE_CHANGE_EN		0x1b4
+#define GE_RECV_CONTROL_REG		0x1e0
+#define GE_STATION_MAC_ADDRESS		0x210
+#define PPE_CFG_TX_PKT_BD_ADDR		0x420
+#define PPE_CFG_MAX_FRAME_LEN_REG	0x408
+#define PPE_CFG_BUS_CTRL_REG		0x424
+#define PPE_CFG_RX_CTRL_REG		0x428
+#define PPE_CFG_RX_PKT_MODE_REG		0x438
+#define PPE_CFG_QOS_VMID_GEN		0x500
+#define PPE_CFG_RX_PKT_INT		0x538
+#define PPE_INTEN			0x600
+#define PPE_INTSTS			0x608
+#define PPE_RINT			0x604
+#define PPE_CFG_STS_MODE		0x700
+#define PPE_HIS_RX_PKT_CNT		0x804
+
+/* REG_INTERRUPT */
+#define RCV_INT				BIT(10)
+#define RCV_NOBUF			BIT(8)
+#define DEF_INT_MASK			(RCV_INT | RCV_NOBUF)
+
+#define RX_DESC_NUM			64
+#define TX_DESC_NUM			64
+#define TX_NEXT(N)			(((N) + 1) & (TX_DESC_NUM-1))
+#define RX_NEXT(N)			(((N) + 1) & (RX_DESC_NUM-1))
+
+#define GMAC_PPE_RX_PKT_MAX_LEN		379
+#define GMAC_MAX_PKT_LEN		1516
+#define DESC_DEF_CFG			0x14
+#define RX_BUF_SIZE			1600
+#define RX_PKT_ERR			0x3
+#define TX_TIMEOUT			(6 * HZ)
+#define RECLAIM_PERIOD			HZ/15
+
+#define DRV_NAME			"hip04-ether"
+
+struct tx_desc {
+	u32 send_addr;
+	u16 reserved_16;
+	u16 send_size;
+	u32 reserved_32;
+	u32 cfg;
+	u32 wb_addr;
+} ____cacheline_aligned;
+
+struct rx_desc {
+	u16 reserved_16;
+	u16 pkt_len;
+	u32 reserve1[3];
+	u32 pkt_err;
+	u32 reserve2[4];
+};
+
+struct hip04_priv {
+	void __iomem *base;
+	int phy_mode;
+	int chan;
+	unsigned int port;
+	unsigned int speed;
+	unsigned int duplex;
+	unsigned int reg_inten;
+
+	struct napi_struct napi;
+	struct net_device *ndev;
+
+	struct tx_desc *tx_desc;
+	dma_addr_t tx_desc_dma;
+	struct sk_buff *tx_skb[TX_DESC_NUM];
+	dma_addr_t tx_phys[TX_DESC_NUM];
+	spinlock_t lock;
+	unsigned int tx_head;
+	unsigned int tx_tail;
+	unsigned int tx_count;
+
+	unsigned char *rx_buf[RX_DESC_NUM];
+	dma_addr_t rx_phys[RX_DESC_NUM];
+	unsigned int rx_head;
+	unsigned int rx_buf_size;
+
+	struct device_node *phy_node;
+	struct phy_device *phy;
+	struct regmap *map;
+	struct timer_list txtimer;
+};
+
+static void hip04_config_port(struct net_device *ndev, u32 speed, u32 duplex)
+{
+	struct hip04_priv *priv = netdev_priv(ndev);
+	u32 val;
+
+	priv->speed = speed;
+	priv->duplex = duplex;
+
+	switch (priv->phy_mode) {
+	case PHY_INTERFACE_MODE_SGMII:
+		if (speed == SPEED_1000)
+			val = 8;
+		else if (speed == SPEED_100)
+			val = 7;
+		else
+			val = 6;	/* SPEED_10 */
+		break;
+	case PHY_INTERFACE_MODE_MII:
+		if (speed == SPEED_100)
+			val = 1;
+		else
+			val = 0;	/* SPEED_10 */
+		break;
+	default:
+		netdev_warn(ndev, "not supported mode\n");
+		val = 0;
+		break;
+	}
+	writel_relaxed(val, priv->base + GE_PORT_MODE);
+
+	val = (duplex) ? BIT(0) : 0;
+	writel_relaxed(val, priv->base + GE_DUPLEX_TYPE);
+
+	val = BIT(0);
+	writel_relaxed(val, priv->base + GE_MODE_CHANGE_EN);
+}
+
+static void hip04_reset_ppe(struct hip04_priv *priv)
+{
+	u32 val, tmp;
+
+	do {
+		regmap_read(priv->map, priv->port * 4 + PPE_CURR_BUF_CNT, &val);
+		regmap_read(priv->map, priv->port * 4 + PPE_CFG_RX_ADDR, &tmp);
+	} while (val & 0xfff);
+}
+
+static void hip04_config_fifo(struct hip04_priv *priv)
+{
+	u32 val;
+
+	val = readl_relaxed(priv->base + PPE_CFG_STS_MODE);
+	val |= BIT(12);			/* PPE_HIS_RX_PKT_CNT read clear */
+	writel_relaxed(val, priv->base + PPE_CFG_STS_MODE);
+
+	val = BIT(priv->port);
+	regmap_write(priv->map, priv->port * 4 + PPE_CFG_POOL_GRP, val);
+
+	val = priv->port << 8;
+	val |= BIT(14);
+	writel_relaxed(val, priv->base + PPE_CFG_QOS_VMID_GEN);
+
+	val = RX_BUF_SIZE;
+	regmap_write(priv->map, priv->port * 4 + PPE_CFG_RX_BUF_SIZE, val);
+
+	val = RX_DESC_NUM << 16;	/* depth */
+	val |= BIT(11);			/* seq: first set first ues */
+	val |= RX_DESC_NUM * priv->chan;	/* start_addr */
+	regmap_write(priv->map, priv->port * 4 + PPE_CFG_RX_FIFO_SIZE, val);
+
+	/* pkt store format */
+	val = NET_IP_ALIGN << 11;	/* align */
+	writel_relaxed(val, priv->base + PPE_CFG_RX_CTRL_REG);
+
+	/* following cfg required for 1000M */
+	/* pkt mode */
+	val = BIT(18);			/* align */
+	writel_relaxed(val, priv->base + PPE_CFG_RX_PKT_MODE_REG);
+
+	/* set bus ctrl */
+	val = BIT(14);			/* buffer locally release */
+	val |= BIT(0);			/* big endian */
+	writel_relaxed(val, priv->base + PPE_CFG_BUS_CTRL_REG);
+
+	/* set max pkt len, curtail if exceed */
+	val = GMAC_PPE_RX_PKT_MAX_LEN;	/* max buffer len */
+	writel_relaxed(val, priv->base + PPE_CFG_MAX_FRAME_LEN_REG);
+
+	/* set max len of each pkt */
+	val = GMAC_MAX_PKT_LEN;		/* max buffer len */
+	writel_relaxed(val, priv->base + GE_MAX_FRM_SIZE_REG);
+
+	/* set min len of each pkt */
+	val = 31;			/* min buffer len */
+	writel_relaxed(val, priv->base + GE_SHORT_RUNTS_THR_REG);
+
+	/* tx */
+	val = readl_relaxed(priv->base + GE_TRANSMIT_CONTROL_REG);
+	val |= BIT(5);			/* tx auto neg */
+	val |= BIT(6);			/* tx add crc */
+	val |= BIT(7);			/* tx short pad through */
+	writel_relaxed(val, priv->base + GE_TRANSMIT_CONTROL_REG);
+
+	/* rx crc */
+	val = BIT(0);			/* rx strip crc */
+	writel_relaxed(val, priv->base + GE_CF_CRC_STRIP_REG);
+
+	/* rx */
+	val = readl_relaxed(priv->base + GE_RECV_CONTROL_REG);
+	val |= BIT(3);			/* rx strip pad */
+	val |= BIT(4);			/* run pkt en */
+	writel_relaxed(val, priv->base + GE_RECV_CONTROL_REG);
+
+	/* auto neg control */
+	val = BIT(0);
+	writel_relaxed(val, priv->base + GE_TX_LOCAL_PAGE_REG);
+}
+
+static void hip04_mac_enable(struct net_device *ndev)
+{
+	struct hip04_priv *priv = netdev_priv(ndev);
+	u32 val;
+
+	/* enable tx & rx */
+	val = readl_relaxed(priv->base + GE_PORT_EN);
+	val |= BIT(1);		/* rx */
+	val |= BIT(2);		/* tx */
+	writel_relaxed(val, priv->base + GE_PORT_EN);
+
+	/* clear rx int */
+	val = RCV_INT;
+	writel_relaxed(val, priv->base + PPE_RINT);
+
+	/* config recv int */
+	val = BIT(6);		/* int threshold 1 package */
+	val |= 0x4;		/* recv timeout */
+	writel_relaxed(val, priv->base + PPE_CFG_RX_PKT_INT);
+
+	/* enable interrupt */
+	priv->reg_inten = DEF_INT_MASK;
+	writel_relaxed(priv->reg_inten, priv->base + PPE_INTEN);
+}
+
+static void hip04_mac_disable(struct net_device *ndev)
+{
+	struct hip04_priv *priv = netdev_priv(ndev);
+	u32 val;
+
+	/* disable int */
+	priv->reg_inten &= ~(RCV_INT | RCV_NOBUF);
+	writel_relaxed(priv->reg_inten, priv->base + PPE_INTEN);
+
+	/* disable tx & rx */
+	val = readl_relaxed(priv->base + GE_PORT_EN);
+	val &= ~(BIT(1));	/* rx */
+	val &= ~(BIT(2));	/* tx */
+	writel_relaxed(val, priv->base + GE_PORT_EN);
+}
+
+static void hip04_set_xmit_desc(struct hip04_priv *priv, dma_addr_t phys)
+{
+	writel(phys, priv->base + PPE_CFG_TX_PKT_BD_ADDR);
+}
+
+static void hip04_set_recv_desc(struct hip04_priv *priv, dma_addr_t phys)
+{
+	regmap_write(priv->map, priv->port * 4 + PPE_CFG_RX_ADDR, phys);
+}
+
+static u32 hip04_recv_cnt(struct hip04_priv *priv)
+{
+	return readl(priv->base + PPE_HIS_RX_PKT_CNT);
+}
+
+static void hip04_update_mac_address(struct net_device *ndev)
+{
+	struct hip04_priv *priv = netdev_priv(ndev);
+
+	writel_relaxed(((ndev->dev_addr[0] << 8) | (ndev->dev_addr[1])),
+			priv->base + GE_STATION_MAC_ADDRESS);
+	writel_relaxed(((ndev->dev_addr[2] << 24) | (ndev->dev_addr[3] << 16) |
+			(ndev->dev_addr[4] << 8) | (ndev->dev_addr[5])),
+			priv->base + GE_STATION_MAC_ADDRESS + 4);
+}
+
+static int hip04_set_mac_address(struct net_device *ndev, void *addr)
+{
+	eth_mac_addr(ndev, addr);
+	hip04_update_mac_address(ndev);
+	return 0;
+}
+
+static void hip04_tx_reclaim(struct net_device *ndev, bool force)
+{
+	struct hip04_priv *priv = netdev_priv(ndev);
+	unsigned tx_head = priv->tx_head;
+	unsigned tx_tail = priv->tx_tail;
+	struct tx_desc *desc;
+	unsigned int bytes_compl = 0, pkts_compl = 0;
+
+	spin_lock_bh(&priv->lock);
+	while ((tx_tail != tx_head) || (priv->tx_count == TX_DESC_NUM)) {
+		desc = &priv->tx_desc[priv->tx_tail];
+		if (desc->send_addr != 0) {
+			if (force)
+				desc->send_addr = 0;
+			else
+				break;
+		}
+		if (priv->tx_phys[tx_tail]) {
+			dma_unmap_single(&ndev->dev, priv->tx_phys[tx_tail],
+				priv->tx_skb[tx_tail]->len, DMA_TO_DEVICE);
+			priv->tx_phys[tx_tail] = 0;
+		}
+		pkts_compl++;
+		bytes_compl += priv->tx_skb[tx_tail]->len;
+		dev_kfree_skb(priv->tx_skb[tx_tail]);
+		priv->tx_skb[tx_tail] = NULL;
+		tx_tail = TX_NEXT(tx_tail);
+		priv->tx_count--;
+	}
+	priv->tx_tail = tx_tail;
+	spin_unlock_bh(&priv->lock);
+
+	netdev_completed_queue(ndev, pkts_compl, bytes_compl);
+
+	if (priv->tx_count)
+		mod_timer(&priv->txtimer, jiffies + RECLAIM_PERIOD);
+
+	if (unlikely(netif_queue_stopped(ndev)) &&
+		(priv->tx_count < TX_DESC_NUM))
+		netif_wake_queue(ndev);
+}
+
+static void hip04_xmit_timer(unsigned long data)
+{
+	struct net_device *ndev = (void *) data;
+
+	hip04_tx_reclaim(ndev, false);
+}
+
+static int hip04_mac_start_xmit(struct sk_buff *skb, struct net_device *ndev)
+{
+	struct hip04_priv *priv = netdev_priv(ndev);
+	struct net_device_stats *stats = &ndev->stats;
+	unsigned int tx_head = priv->tx_head;
+	struct tx_desc *desc = &priv->tx_desc[tx_head];
+	dma_addr_t phys;
+
+	hip04_tx_reclaim(ndev, false);
+	mod_timer(&priv->txtimer, jiffies + RECLAIM_PERIOD);
+
+	if (priv->tx_count >= TX_DESC_NUM) {
+		netif_stop_queue(ndev);
+		return NETDEV_TX_BUSY;
+	}
+
+	phys = dma_map_single(&ndev->dev, skb->data, skb->len, DMA_TO_DEVICE);
+	if (dma_mapping_error(&ndev->dev, phys)) {
+		dev_kfree_skb(skb);
+		return NETDEV_TX_OK;
+	}
+
+	priv->tx_skb[tx_head] = skb;
+	priv->tx_phys[tx_head] = phys;
+	desc->send_addr = cpu_to_be32(phys);
+	desc->send_size = cpu_to_be16(skb->len);
+	desc->cfg = cpu_to_be32(DESC_DEF_CFG);
+	phys = priv->tx_desc_dma + tx_head * sizeof(struct tx_desc);
+	desc->wb_addr = cpu_to_be32(phys);
+	skb_tx_timestamp(skb);
+
+	/* Don't wait up for transmitted skbs to be freed. */
+	skb_orphan(skb);
+
+	hip04_set_xmit_desc(priv, phys);
+	priv->tx_head = TX_NEXT(tx_head);
+	netdev_sent_queue(ndev, skb->len);
+
+	stats->tx_bytes += skb->len;
+	stats->tx_packets++;
+	priv->tx_count++;
+
+	return NETDEV_TX_OK;
+}
+
+static int hip04_rx_poll(struct napi_struct *napi, int budget)
+{
+	struct hip04_priv *priv = container_of(napi, struct hip04_priv, napi);
+	struct net_device *ndev = priv->ndev;
+	struct net_device_stats *stats = &ndev->stats;
+	unsigned int cnt = hip04_recv_cnt(priv);
+	struct rx_desc *desc;
+	struct sk_buff *skb;
+	unsigned char *buf;
+	bool last = false;
+	dma_addr_t phys;
+	int rx = 0;
+	u16 len;
+	u32 err;
+
+	while (cnt && !last) {
+		buf = priv->rx_buf[priv->rx_head];
+		skb = build_skb(buf, priv->rx_buf_size);
+		if (unlikely(!skb))
+			net_dbg_ratelimited("build_skb failed\n");
+
+		dma_unmap_single(&ndev->dev, priv->rx_phys[priv->rx_head],
+				RX_BUF_SIZE, DMA_FROM_DEVICE);
+		priv->rx_phys[priv->rx_head] = 0;
+
+		desc = (struct rx_desc *)skb->data;
+		len = be16_to_cpu(desc->pkt_len);
+		err = be32_to_cpu(desc->pkt_err);
+
+		if (len > RX_BUF_SIZE)
+			len = RX_BUF_SIZE;
+
+		if (0 == len) {
+			dev_kfree_skb_any(skb);
+			last = true;
+		} else if (err & RX_PKT_ERR) {
+			dev_kfree_skb_any(skb);
+			stats->rx_dropped++;
+			stats->rx_errors++;
+		} else {
+			skb_reserve(skb, NET_SKB_PAD + NET_IP_ALIGN);
+			skb_put(skb, len);
+			skb->protocol = eth_type_trans(skb, ndev);
+			napi_gro_receive(&priv->napi, skb);
+			stats->rx_packets++;
+			stats->rx_bytes += len;
+			rx++;
+		}
+
+		buf = netdev_alloc_frag(priv->rx_buf_size);
+		if (!buf)
+			return -ENOMEM;
+		phys = dma_map_single(&ndev->dev, buf,
+				RX_BUF_SIZE, DMA_FROM_DEVICE);
+		if (dma_mapping_error(&ndev->dev, phys))
+			return -EIO;
+		priv->rx_buf[priv->rx_head] = buf;
+		priv->rx_phys[priv->rx_head] = phys;
+		hip04_set_recv_desc(priv, phys);
+
+		priv->rx_head = RX_NEXT(priv->rx_head);
+		if (rx >= budget)
+			break;
+
+		if (--cnt == 0)
+			cnt = hip04_recv_cnt(priv);
+	}
+
+	if (rx < budget) {
+		napi_complete(napi);
+
+		/* enable rx interrupt */
+		priv->reg_inten |= RCV_INT | RCV_NOBUF;
+		writel_relaxed(priv->reg_inten, priv->base + PPE_INTEN);
+	}
+
+	return rx;
+}
+
+static irqreturn_t hip04_mac_interrupt(int irq, void *dev_id)
+{
+	struct net_device *ndev = (struct net_device *) dev_id;
+	struct hip04_priv *priv = netdev_priv(ndev);
+	u32 ists = readl_relaxed(priv->base + PPE_INTSTS);
+
+	writel_relaxed(DEF_INT_MASK, priv->base + PPE_RINT);
+
+	if (ists & (RCV_INT | RCV_NOBUF)) {
+		/* disable rx interrupt */
+		priv->reg_inten &= ~(RCV_INT | RCV_NOBUF);
+		writel_relaxed(priv->reg_inten, priv->base + PPE_INTEN);
+		napi_schedule(&priv->napi);
+	}
+
+	return IRQ_HANDLED;
+}
+
+static void hip04_adjust_link(struct net_device *ndev)
+{
+	struct hip04_priv *priv = netdev_priv(ndev);
+	struct phy_device *phy = priv->phy;
+
+	if ((priv->speed != phy->speed) || (priv->duplex != phy->duplex)) {
+		hip04_config_port(ndev, phy->speed, phy->duplex);
+		phy_print_status(phy);
+	}
+}
+
+static int hip04_mac_open(struct net_device *ndev)
+{
+	struct hip04_priv *priv = netdev_priv(ndev);
+	int i;
+
+	priv->rx_head = 0;
+	priv->tx_head = 0;
+	priv->tx_tail = 0;
+	priv->tx_count = 0;
+
+	hip04_reset_ppe(priv);
+	for (i = 0; i < RX_DESC_NUM; i++) {
+		dma_addr_t phys;
+
+		phys = dma_map_single(&ndev->dev, priv->rx_buf[i],
+				RX_BUF_SIZE, DMA_FROM_DEVICE);
+		if (dma_mapping_error(&ndev->dev, phys))
+			return -EIO;
+
+		priv->rx_phys[i] = phys;
+		hip04_set_recv_desc(priv, phys);
+	}
+
+	if (priv->phy)
+		phy_start(priv->phy);
+
+	netif_start_queue(ndev);
+	hip04_mac_enable(ndev);
+	napi_enable(&priv->napi);
+
+	return 0;
+}
+
+static int hip04_mac_stop(struct net_device *ndev)
+{
+	struct hip04_priv *priv = netdev_priv(ndev);
+	int i;
+
+	napi_disable(&priv->napi);
+	netif_stop_queue(ndev);
+	hip04_mac_disable(ndev);
+	hip04_tx_reclaim(ndev, true);
+	hip04_reset_ppe(priv);
+
+	if (priv->phy)
+		phy_stop(priv->phy);
+
+	for (i = 0; i < RX_DESC_NUM; i++) {
+		if (priv->rx_phys[i]) {
+			dma_unmap_single(&ndev->dev, priv->rx_phys[i],
+					RX_BUF_SIZE, DMA_FROM_DEVICE);
+			priv->rx_phys[i] = 0;
+		}
+	}
+
+	return 0;
+}
+
+static void hip04_timeout(struct net_device *ndev)
+{
+	hip04_mac_stop(ndev);
+	hip04_mac_open(ndev);
+	return;
+}
+
+static struct net_device_ops hip04_netdev_ops = {
+	.ndo_open		= hip04_mac_open,
+	.ndo_stop		= hip04_mac_stop,
+	.ndo_start_xmit		= hip04_mac_start_xmit,
+	.ndo_set_mac_address	= hip04_set_mac_address,
+	.ndo_tx_timeout         = hip04_timeout,
+	.ndo_validate_addr	= eth_validate_addr,
+	.ndo_change_mtu		= eth_change_mtu,
+};
+
+static int hip04_alloc_ring(struct net_device *ndev, struct device *d)
+{
+	struct hip04_priv *priv = netdev_priv(ndev);
+	int i;
+
+	priv->tx_desc = dma_alloc_coherent(d,
+			TX_DESC_NUM * sizeof(struct tx_desc),
+			&priv->tx_desc_dma, GFP_KERNEL);
+	if (!priv->tx_desc)
+		return -ENOMEM;
+
+	priv->rx_buf_size = RX_BUF_SIZE +
+			    SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
+	for (i = 0; i < RX_DESC_NUM; i++) {
+		priv->rx_buf[i] = netdev_alloc_frag(priv->rx_buf_size);
+		if (!priv->rx_buf[i])
+			return -ENOMEM;
+	}
+
+	return 0;
+}
+
+static void hip04_free_ring(struct net_device *ndev, struct device *d)
+{
+	struct hip04_priv *priv = netdev_priv(ndev);
+	int i;
+
+	for (i = 0; i < RX_DESC_NUM; i++)
+		if (priv->rx_buf[i])
+			put_page(virt_to_head_page(priv->rx_buf[i]));
+
+	for (i = 0; i < TX_DESC_NUM; i++)
+		if (priv->tx_skb[i])
+			dev_kfree_skb_any(priv->tx_skb[i]);
+
+	dma_free_coherent(d, TX_DESC_NUM * sizeof(struct tx_desc),
+			priv->tx_desc, priv->tx_desc_dma);
+}
+
+static int hip04_mac_probe(struct platform_device *pdev)
+{
+	struct device *d = &pdev->dev;
+	struct device_node *node = d->of_node;
+	struct of_phandle_args arg;
+	struct net_device *ndev;
+	struct hip04_priv *priv;
+	struct resource *res;
+	unsigned int irq;
+	int ret;
+
+	ndev = alloc_etherdev(sizeof(struct hip04_priv));
+	if (!ndev)
+		return -ENOMEM;
+
+	priv = netdev_priv(ndev);
+	priv->ndev = ndev;
+	platform_set_drvdata(pdev, ndev);
+
+	res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
+	priv->base = devm_ioremap_resource(d, res);
+	if (IS_ERR(priv->base)) {
+		ret = PTR_ERR(priv->base);
+		goto init_fail;
+	}
+
+	ret = of_parse_phandle_with_fixed_args(node, "port-handle", 2, 0, &arg);
+	if (ret < 0) {
+		dev_warn(d, "no port-handle\n");
+		goto init_fail;
+	}
+
+	priv->port = arg.args[0];
+	priv->chan = arg.args[1];
+
+	priv->map = syscon_node_to_regmap(arg.np);
+	if (IS_ERR(priv->map)) {
+		dev_warn(d, "no syscon hisilicon,hip04-ppe\n");
+		ret = PTR_ERR(priv->map);
+		goto init_fail;
+	}
+
+	priv->phy_mode = of_get_phy_mode(node);
+	if (priv->phy_mode < 0) {
+		dev_warn(d, "not find phy-mode\n");
+		ret = -EINVAL;
+		goto init_fail;
+	}
+
+	irq = platform_get_irq(pdev, 0);
+	if (irq <= 0) {
+		ret = -EINVAL;
+		goto init_fail;
+	}
+
+	ret = devm_request_irq(d, irq, hip04_mac_interrupt,
+				0, pdev->name, ndev);
+	if (ret) {
+		netdev_err(ndev, "devm_request_irq failed\n");
+		goto init_fail;
+	}
+
+	priv->phy_node = of_parse_phandle(node, "phy-handle", 0);
+	if (priv->phy_node) {
+		priv->phy = of_phy_connect(ndev, priv->phy_node,
+			&hip04_adjust_link, 0, priv->phy_mode);
+		if (!priv->phy) {
+			ret = -EPROBE_DEFER;
+			goto init_fail;
+		}
+	}
+
+	ether_setup(ndev);
+	ndev->netdev_ops = &hip04_netdev_ops;
+	ndev->watchdog_timeo = TX_TIMEOUT;
+	ndev->priv_flags |= IFF_UNICAST_FLT;
+	ndev->irq = irq;
+	netif_napi_add(ndev, &priv->napi, hip04_rx_poll, RX_DESC_NUM);
+	SET_NETDEV_DEV(ndev, &pdev->dev);
+
+	hip04_reset_ppe(priv);
+	if (priv->phy_mode == PHY_INTERFACE_MODE_MII)
+		hip04_config_port(ndev, SPEED_100, DUPLEX_FULL);
+
+	hip04_config_fifo(priv);
+	random_ether_addr(ndev->dev_addr);
+	hip04_update_mac_address(ndev);
+
+	ret = hip04_alloc_ring(ndev, d);
+	if (ret) {
+		netdev_err(ndev, "alloc ring fail\n");
+		goto alloc_fail;
+	}
+
+	setup_timer(&priv->txtimer, hip04_xmit_timer, (unsigned long) ndev);
+	ret = register_netdev(ndev);
+	if (ret) {
+		free_netdev(ndev);
+		goto alloc_fail;
+	}
+
+	return 0;
+
+alloc_fail:
+	hip04_free_ring(ndev, d);
+init_fail:
+	of_node_put(priv->phy_node);
+	free_netdev(ndev);
+	return ret;
+}
+
+static int hip04_remove(struct platform_device *pdev)
+{
+	struct net_device *ndev = platform_get_drvdata(pdev);
+	struct hip04_priv *priv = netdev_priv(ndev);
+	struct device *d = &pdev->dev;
+
+	if (priv->phy)
+		phy_disconnect(priv->phy);
+
+	del_timer_sync(&priv->txtimer);
+	hip04_free_ring(ndev, d);
+	unregister_netdev(ndev);
+	free_irq(ndev->irq, ndev);
+	of_node_put(priv->phy_node);
+	free_netdev(ndev);
+
+	return 0;
+}
+
+static const struct of_device_id hip04_mac_match[] = {
+	{ .compatible = "hisilicon,hip04-mac" },
+	{ }
+};
+
+static struct platform_driver hip04_mac_driver = {
+	.probe	= hip04_mac_probe,
+	.remove	= hip04_remove,
+	.driver	= {
+		.name		= DRV_NAME,
+		.owner		= THIS_MODULE,
+		.of_match_table	= hip04_mac_match,
+	},
+};
+module_platform_driver(hip04_mac_driver);
+
+MODULE_DESCRIPTION("HISILICON P04 Ethernet driver");
+MODULE_LICENSE("GPL v2");
+MODULE_ALIAS("platform:hip04-ether");
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 148+ messages in thread

* Re: [PATCH 3/3] net: hisilicon: new hip04 ethernet driver
  2014-04-03 15:27         ` Russell King - ARM Linux
@ 2014-04-04  6:52           ` Zhangfei Gao
  -1 siblings, 0 replies; 148+ messages in thread
From: Zhangfei Gao @ 2014-04-04  6:52 UTC (permalink / raw)
  To: Russell King - ARM Linux
  Cc: Arnd Bergmann, Mark Rutland, devicetree, Florian Fainelli,
	eric.dumazet, Sergei Shtylyov, netdev, David Laight,
	Zhangfei Gao, David S. Miller, linux-arm-kernel

Dear Russell

On Thu, Apr 3, 2014 at 11:27 PM, Russell King - ARM Linux
<linux@arm.linux.org.uk> wrote:
> On Wed, Apr 02, 2014 at 11:21:45AM +0200, Arnd Bergmann wrote:
>> - As David Laight pointed out earlier, you must also ensure that
>>   you don't have too much /data/ pending in the descriptor ring
>>   when you stop the queue. For a 10mbit connection, you have already
>>   tested (as we discussed on IRC) that 64 descriptors with 1500 byte
>>   frames gives you a 68ms round-trip ping time, which is too much.
>>   Conversely, on 1gbit, having only 64 descriptors actually seems
>>   a little low, and you may be able to get better throughput if
>>   you extend the ring to e.g. 512 descriptors.
>
> You don't manage that by stopping the queue - there's separate interfaces
> where you report how many bytes you've queued (netdev_sent_queue()) and
> how many bytes/packets you've sent (netdev_tx_completed_queue()).  This
> allows the netdev schedulers to limit how much data is held in the queue,
> preserving interactivity while allowing the advantages of larger rings.

My god, it's awesome.
The latency can be solved via adding netdev_sent_queue in xmit, and
netdev_completed_queue in reclaim.
In the experiment,
iperf -P 3 could get 930M, and ping could get response within 0.4 ms
in the meantime.

Is that mean the timer -> reclaim should be removed at all?
The background are.
1. No xmit_complete interrupt.
2. Only xmit call reclaim used buffer can achieve best throughput.
Adding timer in case no next xmit for reclaim.

>
>> > +       phys = dma_map_single(&ndev->dev, skb->data, skb->len, DMA_TO_DEVICE);
>> > +       if (dma_mapping_error(&ndev->dev, phys)) {
>> > +               dev_kfree_skb(skb);
>> > +               return NETDEV_TX_OK;
>> > +       }
>> > +
>> > +       priv->tx_skb[tx_head] = skb;
>> > +       priv->tx_phys[tx_head] = phys;
>> > +       desc->send_addr = cpu_to_be32(phys);
>> > +       desc->send_size = cpu_to_be16(skb->len);
>> > +       desc->cfg = cpu_to_be32(DESC_DEF_CFG);
>> > +       phys = priv->tx_desc_dma + tx_head * sizeof(struct tx_desc);
>> > +       desc->wb_addr = cpu_to_be32(phys);
>>
>> One detail: since you don't have cache-coherent DMA, "desc" will
>> reside in uncached memory, so you try to minimize the number of accesses.
>> It's probably faster if you build the descriptor on the stack and
>> then atomically copy it over, rather than assigning each member at
>> a time.
>
> DMA coherent memory is write combining, so multiple writes will be
> coalesced.  This also means that barriers may be required to ensure the
> descriptors are pushed out in a timely manner if something like writel()
> is not used in the transmit-triggering path.
>
Currently writel is used in xmit.
And regmap_write -> writel is used in poll.

Thanks

^ permalink raw reply	[flat|nested] 148+ messages in thread

* [PATCH 3/3] net: hisilicon: new hip04 ethernet driver
@ 2014-04-04  6:52           ` Zhangfei Gao
  0 siblings, 0 replies; 148+ messages in thread
From: Zhangfei Gao @ 2014-04-04  6:52 UTC (permalink / raw)
  To: linux-arm-kernel

Dear Russell

On Thu, Apr 3, 2014 at 11:27 PM, Russell King - ARM Linux
<linux@arm.linux.org.uk> wrote:
> On Wed, Apr 02, 2014 at 11:21:45AM +0200, Arnd Bergmann wrote:
>> - As David Laight pointed out earlier, you must also ensure that
>>   you don't have too much /data/ pending in the descriptor ring
>>   when you stop the queue. For a 10mbit connection, you have already
>>   tested (as we discussed on IRC) that 64 descriptors with 1500 byte
>>   frames gives you a 68ms round-trip ping time, which is too much.
>>   Conversely, on 1gbit, having only 64 descriptors actually seems
>>   a little low, and you may be able to get better throughput if
>>   you extend the ring to e.g. 512 descriptors.
>
> You don't manage that by stopping the queue - there's separate interfaces
> where you report how many bytes you've queued (netdev_sent_queue()) and
> how many bytes/packets you've sent (netdev_tx_completed_queue()).  This
> allows the netdev schedulers to limit how much data is held in the queue,
> preserving interactivity while allowing the advantages of larger rings.

My god, it's awesome.
The latency can be solved via adding netdev_sent_queue in xmit, and
netdev_completed_queue in reclaim.
In the experiment,
iperf -P 3 could get 930M, and ping could get response within 0.4 ms
in the meantime.

Is that mean the timer -> reclaim should be removed at all?
The background are.
1. No xmit_complete interrupt.
2. Only xmit call reclaim used buffer can achieve best throughput.
Adding timer in case no next xmit for reclaim.

>
>> > +       phys = dma_map_single(&ndev->dev, skb->data, skb->len, DMA_TO_DEVICE);
>> > +       if (dma_mapping_error(&ndev->dev, phys)) {
>> > +               dev_kfree_skb(skb);
>> > +               return NETDEV_TX_OK;
>> > +       }
>> > +
>> > +       priv->tx_skb[tx_head] = skb;
>> > +       priv->tx_phys[tx_head] = phys;
>> > +       desc->send_addr = cpu_to_be32(phys);
>> > +       desc->send_size = cpu_to_be16(skb->len);
>> > +       desc->cfg = cpu_to_be32(DESC_DEF_CFG);
>> > +       phys = priv->tx_desc_dma + tx_head * sizeof(struct tx_desc);
>> > +       desc->wb_addr = cpu_to_be32(phys);
>>
>> One detail: since you don't have cache-coherent DMA, "desc" will
>> reside in uncached memory, so you try to minimize the number of accesses.
>> It's probably faster if you build the descriptor on the stack and
>> then atomically copy it over, rather than assigning each member at
>> a time.
>
> DMA coherent memory is write combining, so multiple writes will be
> coalesced.  This also means that barriers may be required to ensure the
> descriptors are pushed out in a timely manner if something like writel()
> is not used in the transmit-triggering path.
>
Currently writel is used in xmit.
And regmap_write -> writel is used in poll.

Thanks

^ permalink raw reply	[flat|nested] 148+ messages in thread

* Re: [PATCH 3/3] net: hisilicon: new hip04 ethernet driver
  2014-04-03 15:27         ` Russell King - ARM Linux
@ 2014-04-03 17:57           ` Arnd Bergmann
  -1 siblings, 0 replies; 148+ messages in thread
From: Arnd Bergmann @ 2014-04-03 17:57 UTC (permalink / raw)
  To: Russell King - ARM Linux
  Cc: Zhangfei Gao, davem, f.fainelli, sergei.shtylyov, mark.rutland,
	David.Laight, eric.dumazet, linux-arm-kernel, netdev, devicetree

On Thursday 03 April 2014 16:27:46 Russell King - ARM Linux wrote:
> On Wed, Apr 02, 2014 at 11:21:45AM +0200, Arnd Bergmann wrote:
> > - As David Laight pointed out earlier, you must also ensure that
> >   you don't have too much /data/ pending in the descriptor ring
> >   when you stop the queue. For a 10mbit connection, you have already
> >   tested (as we discussed on IRC) that 64 descriptors with 1500 byte
> >   frames gives you a 68ms round-trip ping time, which is too much.
> >   Conversely, on 1gbit, having only 64 descriptors actually seems
> >   a little low, and you may be able to get better throughput if
> >   you extend the ring to e.g. 512 descriptors.
> 
> You don't manage that by stopping the queue - there's separate interfaces
> where you report how many bytes you've queued (netdev_sent_queue()) and
> how many bytes/packets you've sent (netdev_tx_completed_queue()).  This
> allows the netdev schedulers to limit how much data is held in the queue,
> preserving interactivity while allowing the advantages of larger rings.

Ah, I didn't know about these.  However, reading through the dql code,
it seems that will not work if the tx reclaim is triggered by a timer,
since it expects to get feedback from the actual hardware behavior. :(

I guess this is (part of) what David Miller also meant by saying it won't
ever work properly. 

> > > +       phys = dma_map_single(&ndev->dev, skb->data, skb->len, DMA_TO_DEVICE);
> > > +       if (dma_mapping_error(&ndev->dev, phys)) {
> > > +               dev_kfree_skb(skb);
> > > +               return NETDEV_TX_OK;
> > > +       }
> > > +
> > > +       priv->tx_skb[tx_head] = skb;
> > > +       priv->tx_phys[tx_head] = phys;
> > > +       desc->send_addr = cpu_to_be32(phys);
> > > +       desc->send_size = cpu_to_be16(skb->len);
> > > +       desc->cfg = cpu_to_be32(DESC_DEF_CFG);
> > > +       phys = priv->tx_desc_dma + tx_head * sizeof(struct tx_desc);
> > > +       desc->wb_addr = cpu_to_be32(phys);
> > 
> > One detail: since you don't have cache-coherent DMA, "desc" will
> > reside in uncached memory, so you try to minimize the number of accesses.
> > It's probably faster if you build the descriptor on the stack and
> > then atomically copy it over, rather than assigning each member at
> > a time.
> 
> DMA coherent memory is write combining, so multiple writes will be
> coalesced.  This also means that barriers may be required to ensure the
> descriptors are pushed out in a timely manner if something like writel()
> is not used in the transmit-triggering path.

Right, makes sense. There is a writel() right after this, so no need
for extra barriers. We already concluded that the store operation on
uncached memory isn't actually a problem, and Zhangfei Gao did some
measurements to check the overhead of the one read from uncached
memory that is in the tx path, which was lost in the noise.

	Arnd

^ permalink raw reply	[flat|nested] 148+ messages in thread

* [PATCH 3/3] net: hisilicon: new hip04 ethernet driver
@ 2014-04-03 17:57           ` Arnd Bergmann
  0 siblings, 0 replies; 148+ messages in thread
From: Arnd Bergmann @ 2014-04-03 17:57 UTC (permalink / raw)
  To: linux-arm-kernel

On Thursday 03 April 2014 16:27:46 Russell King - ARM Linux wrote:
> On Wed, Apr 02, 2014 at 11:21:45AM +0200, Arnd Bergmann wrote:
> > - As David Laight pointed out earlier, you must also ensure that
> >   you don't have too much /data/ pending in the descriptor ring
> >   when you stop the queue. For a 10mbit connection, you have already
> >   tested (as we discussed on IRC) that 64 descriptors with 1500 byte
> >   frames gives you a 68ms round-trip ping time, which is too much.
> >   Conversely, on 1gbit, having only 64 descriptors actually seems
> >   a little low, and you may be able to get better throughput if
> >   you extend the ring to e.g. 512 descriptors.
> 
> You don't manage that by stopping the queue - there's separate interfaces
> where you report how many bytes you've queued (netdev_sent_queue()) and
> how many bytes/packets you've sent (netdev_tx_completed_queue()).  This
> allows the netdev schedulers to limit how much data is held in the queue,
> preserving interactivity while allowing the advantages of larger rings.

Ah, I didn't know about these.  However, reading through the dql code,
it seems that will not work if the tx reclaim is triggered by a timer,
since it expects to get feedback from the actual hardware behavior. :(

I guess this is (part of) what David Miller also meant by saying it won't
ever work properly. 

> > > +       phys = dma_map_single(&ndev->dev, skb->data, skb->len, DMA_TO_DEVICE);
> > > +       if (dma_mapping_error(&ndev->dev, phys)) {
> > > +               dev_kfree_skb(skb);
> > > +               return NETDEV_TX_OK;
> > > +       }
> > > +
> > > +       priv->tx_skb[tx_head] = skb;
> > > +       priv->tx_phys[tx_head] = phys;
> > > +       desc->send_addr = cpu_to_be32(phys);
> > > +       desc->send_size = cpu_to_be16(skb->len);
> > > +       desc->cfg = cpu_to_be32(DESC_DEF_CFG);
> > > +       phys = priv->tx_desc_dma + tx_head * sizeof(struct tx_desc);
> > > +       desc->wb_addr = cpu_to_be32(phys);
> > 
> > One detail: since you don't have cache-coherent DMA, "desc" will
> > reside in uncached memory, so you try to minimize the number of accesses.
> > It's probably faster if you build the descriptor on the stack and
> > then atomically copy it over, rather than assigning each member at
> > a time.
> 
> DMA coherent memory is write combining, so multiple writes will be
> coalesced.  This also means that barriers may be required to ensure the
> descriptors are pushed out in a timely manner if something like writel()
> is not used in the transmit-triggering path.

Right, makes sense. There is a writel() right after this, so no need
for extra barriers. We already concluded that the store operation on
uncached memory isn't actually a problem, and Zhangfei Gao did some
measurements to check the overhead of the one read from uncached
memory that is in the tx path, which was lost in the noise.

	Arnd

^ permalink raw reply	[flat|nested] 148+ messages in thread

* Re: [PATCH 3/3] net: hisilicon: new hip04 ethernet driver
  2014-04-03 15:42           ` David Laight
@ 2014-04-03 15:50             ` Russell King - ARM Linux
  -1 siblings, 0 replies; 148+ messages in thread
From: Russell King - ARM Linux @ 2014-04-03 15:50 UTC (permalink / raw)
  To: David Laight
  Cc: Arnd Bergmann, Zhangfei Gao, davem, f.fainelli, sergei.shtylyov,
	mark.rutland, eric.dumazet, linux-arm-kernel, netdev, devicetree

On Thu, Apr 03, 2014 at 03:42:00PM +0000, David Laight wrote:
> From: Russell King - ARM Linux
> > DMA coherent memory is write combining, so multiple writes will be
> > coalesced.  This also means that barriers may be required to ensure the
> > descriptors are pushed out in a timely manner if something like writel()
> > is not used in the transmit-triggering path.
> 
> You also have to ensure that the write that changes the 'owner'
> bit is the one that happens last.
> 
> If (for example) a descriptor has two words, one containing the
> buffer address and the other containing the length and flags,
> then you have to absolutely ensure that the hardware will not
> read the new 'flags' with the old 'buffer address'.
> Any write to tell the hardware to look at the tx ring won't
> help you - the hardware might be reading the descriptor anyway.
> 
> Even if the accesses are uncached, you need the appropriate
> barrier (or volatiles) to stop gcc reordering the writes.
> (I think accesses to volatiles can't be reordered - check.)

Exactly... I wish Freescale were as thoughtful as you are on this point. :)

-- 
FTTC broadband for 0.8mile line: now at 9.7Mbps down 460kbps up... slowly
improving, and getting towards what was expected from it.

^ permalink raw reply	[flat|nested] 148+ messages in thread

* [PATCH 3/3] net: hisilicon: new hip04 ethernet driver
@ 2014-04-03 15:50             ` Russell King - ARM Linux
  0 siblings, 0 replies; 148+ messages in thread
From: Russell King - ARM Linux @ 2014-04-03 15:50 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Apr 03, 2014 at 03:42:00PM +0000, David Laight wrote:
> From: Russell King - ARM Linux
> > DMA coherent memory is write combining, so multiple writes will be
> > coalesced.  This also means that barriers may be required to ensure the
> > descriptors are pushed out in a timely manner if something like writel()
> > is not used in the transmit-triggering path.
> 
> You also have to ensure that the write that changes the 'owner'
> bit is the one that happens last.
> 
> If (for example) a descriptor has two words, one containing the
> buffer address and the other containing the length and flags,
> then you have to absolutely ensure that the hardware will not
> read the new 'flags' with the old 'buffer address'.
> Any write to tell the hardware to look at the tx ring won't
> help you - the hardware might be reading the descriptor anyway.
> 
> Even if the accesses are uncached, you need the appropriate
> barrier (or volatiles) to stop gcc reordering the writes.
> (I think accesses to volatiles can't be reordered - check.)

Exactly... I wish Freescale were as thoughtful as you are on this point. :)

-- 
FTTC broadband for 0.8mile line: now at 9.7Mbps down 460kbps up... slowly
improving, and getting towards what was expected from it.

^ permalink raw reply	[flat|nested] 148+ messages in thread

* RE: [PATCH 3/3] net: hisilicon: new hip04 ethernet driver
  2014-04-03 15:27         ` Russell King - ARM Linux
@ 2014-04-03 15:42           ` David Laight
  -1 siblings, 0 replies; 148+ messages in thread
From: David Laight @ 2014-04-03 15:42 UTC (permalink / raw)
  To: 'Russell King - ARM Linux', Arnd Bergmann
  Cc: Zhangfei Gao, davem, f.fainelli, sergei.shtylyov, mark.rutland,
	eric.dumazet, linux-arm-kernel, netdev, devicetree

From: Russell King - ARM Linux
> DMA coherent memory is write combining, so multiple writes will be
> coalesced.  This also means that barriers may be required to ensure the
> descriptors are pushed out in a timely manner if something like writel()
> is not used in the transmit-triggering path.

You also have to ensure that the write that changes the 'owner'
bit is the one that happens last.

If (for example) a descriptor has two words, one containing the
buffer address and the other containing the length and flags,
then you have to absolutely ensure that the hardware will not
read the new 'flags' with the old 'buffer address'.
Any write to tell the hardware to look at the tx ring won't
help you - the hardware might be reading the descriptor anyway.

Even if the accesses are uncached, you need the appropriate
barrier (or volatiles) to stop gcc reordering the writes.
(I think accesses to volatiles can't be reordered - check.)

	David

^ permalink raw reply	[flat|nested] 148+ messages in thread

* [PATCH 3/3] net: hisilicon: new hip04 ethernet driver
@ 2014-04-03 15:42           ` David Laight
  0 siblings, 0 replies; 148+ messages in thread
From: David Laight @ 2014-04-03 15:42 UTC (permalink / raw)
  To: linux-arm-kernel

From: Russell King - ARM Linux
> DMA coherent memory is write combining, so multiple writes will be
> coalesced.  This also means that barriers may be required to ensure the
> descriptors are pushed out in a timely manner if something like writel()
> is not used in the transmit-triggering path.

You also have to ensure that the write that changes the 'owner'
bit is the one that happens last.

If (for example) a descriptor has two words, one containing the
buffer address and the other containing the length and flags,
then you have to absolutely ensure that the hardware will not
read the new 'flags' with the old 'buffer address'.
Any write to tell the hardware to look at the tx ring won't
help you - the hardware might be reading the descriptor anyway.

Even if the accesses are uncached, you need the appropriate
barrier (or volatiles) to stop gcc reordering the writes.
(I think accesses to volatiles can't be reordered - check.)

	David

^ permalink raw reply	[flat|nested] 148+ messages in thread

* Re: [PATCH 3/3] net: hisilicon: new hip04 ethernet driver
  2014-04-02 10:04         ` David Laight
@ 2014-04-03 15:38           ` zhangfei
  -1 siblings, 0 replies; 148+ messages in thread
From: zhangfei @ 2014-04-03 15:38 UTC (permalink / raw)
  To: David Laight, 'Arnd Bergmann'
  Cc: davem, linux, f.fainelli, sergei.shtylyov, mark.rutland,
	eric.dumazet, linux-arm-kernel, netdev, devicetree

Dear David

On 04/02/2014 06:04 PM, David Laight wrote:
> From: Arnd Bergmann
>> On Tuesday 01 April 2014 21:27:12 Zhangfei Gao wrote:
>>> +static int hip04_mac_start_xmit(struct sk_buff *skb, struct net_device *ndev)
>>
>> While it looks like there are no serious functionality bugs left, this
>> function is rather inefficient, as has been pointed out before:
>>
>>> +{
>>> +       struct hip04_priv *priv = netdev_priv(ndev);
>>> +       struct net_device_stats *stats = &ndev->stats;
>>> +       unsigned int tx_head = priv->tx_head;
>>> +       struct tx_desc *desc = &priv->tx_desc[tx_head];
>>> +       dma_addr_t phys;
>>> +
>>> +       hip04_tx_reclaim(ndev, false);
>>> +       mod_timer(&priv->txtimer, jiffies + RECLAIM_PERIOD);
>>> +
>>> +       if (priv->tx_count >= TX_DESC_NUM) {
>>> +               netif_stop_queue(ndev);
>>> +               return NETDEV_TX_BUSY;
>>> +       }
>>
>> This is where you have two problems:
>>
>> - if the descriptor ring is full, you wait for RECLAIM_PERIOD,
>>    which is far too long at 500ms, because during that time you
>>    are not able to add further data to the stopped queue.
>
> Best to have some idea how long it will take for the ring to empty.
> IIRC you need a count of the bytes in the tx ring anyway.
> There isn't much point waking up until most of the queued
> transmits have had time to complete.

In fact, there is no good way to check when the packed is send out 
except check desc becomes 0, even this is not accurate in 100M mode.
The hardware guy just suggest we can assume the data is send out when 
send to the dma.
Though it can work with iperf, still heistate to use it.

>
>> - As David Laight pointed out earlier, you must also ensure that
>>    you don't have too much /data/ pending in the descriptor ring
>>    when you stop the queue. For a 10mbit connection, you have already
>>    tested (as we discussed on IRC) that 64 descriptors with 1500 byte
>>    frames gives you a 68ms round-trip ping time, which is too much.
>>    Conversely, on 1gbit, having only 64 descriptors actually seems
>>    a little low, and you may be able to get better throughput if
>>    you extend the ring to e.g. 512 descriptors.
>
> The descriptor count matters most for small packets.
> There are workloads (I've got one) that can send 1000s of small packets/sec
> on a single TCP connection (there will be receive traffic).
>
>>> +       phys = dma_map_single(&ndev->dev, skb->data, skb->len, DMA_TO_DEVICE);
>>> +       if (dma_mapping_error(&ndev->dev, phys)) {
>>> +               dev_kfree_skb(skb);
>>> +               return NETDEV_TX_OK;
>>> +       }
>>> +
>>> +       priv->tx_skb[tx_head] = skb;
>>> +       priv->tx_phys[tx_head] = phys;
>>> +       desc->send_addr = cpu_to_be32(phys);
>>> +       desc->send_size = cpu_to_be16(skb->len);
>>> +       desc->cfg = cpu_to_be32(DESC_DEF_CFG);
>>> +       phys = priv->tx_desc_dma + tx_head * sizeof(struct tx_desc);
>>> +       desc->wb_addr = cpu_to_be32(phys);
>>
>> One detail: since you don't have cache-coherent DMA, "desc" will
>> reside in uncached memory, so you try to minimize the number of accesses.
>> It's probably faster if you build the descriptor on the stack and
>> then atomically copy it over, rather than assigning each member at
>> a time.
>
> I'm not sure, the writes to uncached memory will probably be
> asynchronous, but you may avoid a stall by separating the
> cycles in time.
> What you need to avoid is reads from uncached memory.
> It may well beneficial for the tx reclaim code to first
> check whether all the transmits have completed (likely)
> instead of testing each descriptor in turn.

It may not needed since no better way to check whether all packets and 
send out.

For 100M interface, the perf is 94M, almost no space to upgrade.
For 1G interface, it is rather fast, when check with iperf by default.
There is only 1 buffer used, so every time free only 1 buffer.

However, just find the throughput improves a lot when increasing thread.
iperf -P 1, throughput: 420M
iperf -P 2, throughput: 740M
iperf -P 4, throughput: 930M


>
>> The same would be true for the rx descriptors.
>
> Actually it is reasonably feasible to put the rx descriptors
> in cacheable memory and to flush the cache lines after adding
> new entries.
> You just need to add the entries one cache line full at a time
> (and ensure that the rx processing code doesn't dirty the line).
>
> Without cache-coherent memory cached tx descriptors are much harder work.
>
> 	David
>
>
>
>
>

^ permalink raw reply	[flat|nested] 148+ messages in thread

* [PATCH 3/3] net: hisilicon: new hip04 ethernet driver
@ 2014-04-03 15:38           ` zhangfei
  0 siblings, 0 replies; 148+ messages in thread
From: zhangfei @ 2014-04-03 15:38 UTC (permalink / raw)
  To: linux-arm-kernel

Dear David

On 04/02/2014 06:04 PM, David Laight wrote:
> From: Arnd Bergmann
>> On Tuesday 01 April 2014 21:27:12 Zhangfei Gao wrote:
>>> +static int hip04_mac_start_xmit(struct sk_buff *skb, struct net_device *ndev)
>>
>> While it looks like there are no serious functionality bugs left, this
>> function is rather inefficient, as has been pointed out before:
>>
>>> +{
>>> +       struct hip04_priv *priv = netdev_priv(ndev);
>>> +       struct net_device_stats *stats = &ndev->stats;
>>> +       unsigned int tx_head = priv->tx_head;
>>> +       struct tx_desc *desc = &priv->tx_desc[tx_head];
>>> +       dma_addr_t phys;
>>> +
>>> +       hip04_tx_reclaim(ndev, false);
>>> +       mod_timer(&priv->txtimer, jiffies + RECLAIM_PERIOD);
>>> +
>>> +       if (priv->tx_count >= TX_DESC_NUM) {
>>> +               netif_stop_queue(ndev);
>>> +               return NETDEV_TX_BUSY;
>>> +       }
>>
>> This is where you have two problems:
>>
>> - if the descriptor ring is full, you wait for RECLAIM_PERIOD,
>>    which is far too long at 500ms, because during that time you
>>    are not able to add further data to the stopped queue.
>
> Best to have some idea how long it will take for the ring to empty.
> IIRC you need a count of the bytes in the tx ring anyway.
> There isn't much point waking up until most of the queued
> transmits have had time to complete.

In fact, there is no good way to check when the packed is send out 
except check desc becomes 0, even this is not accurate in 100M mode.
The hardware guy just suggest we can assume the data is send out when 
send to the dma.
Though it can work with iperf, still heistate to use it.

>
>> - As David Laight pointed out earlier, you must also ensure that
>>    you don't have too much /data/ pending in the descriptor ring
>>    when you stop the queue. For a 10mbit connection, you have already
>>    tested (as we discussed on IRC) that 64 descriptors with 1500 byte
>>    frames gives you a 68ms round-trip ping time, which is too much.
>>    Conversely, on 1gbit, having only 64 descriptors actually seems
>>    a little low, and you may be able to get better throughput if
>>    you extend the ring to e.g. 512 descriptors.
>
> The descriptor count matters most for small packets.
> There are workloads (I've got one) that can send 1000s of small packets/sec
> on a single TCP connection (there will be receive traffic).
>
>>> +       phys = dma_map_single(&ndev->dev, skb->data, skb->len, DMA_TO_DEVICE);
>>> +       if (dma_mapping_error(&ndev->dev, phys)) {
>>> +               dev_kfree_skb(skb);
>>> +               return NETDEV_TX_OK;
>>> +       }
>>> +
>>> +       priv->tx_skb[tx_head] = skb;
>>> +       priv->tx_phys[tx_head] = phys;
>>> +       desc->send_addr = cpu_to_be32(phys);
>>> +       desc->send_size = cpu_to_be16(skb->len);
>>> +       desc->cfg = cpu_to_be32(DESC_DEF_CFG);
>>> +       phys = priv->tx_desc_dma + tx_head * sizeof(struct tx_desc);
>>> +       desc->wb_addr = cpu_to_be32(phys);
>>
>> One detail: since you don't have cache-coherent DMA, "desc" will
>> reside in uncached memory, so you try to minimize the number of accesses.
>> It's probably faster if you build the descriptor on the stack and
>> then atomically copy it over, rather than assigning each member at
>> a time.
>
> I'm not sure, the writes to uncached memory will probably be
> asynchronous, but you may avoid a stall by separating the
> cycles in time.
> What you need to avoid is reads from uncached memory.
> It may well beneficial for the tx reclaim code to first
> check whether all the transmits have completed (likely)
> instead of testing each descriptor in turn.

It may not needed since no better way to check whether all packets and 
send out.

For 100M interface, the perf is 94M, almost no space to upgrade.
For 1G interface, it is rather fast, when check with iperf by default.
There is only 1 buffer used, so every time free only 1 buffer.

However, just find the throughput improves a lot when increasing thread.
iperf -P 1, throughput: 420M
iperf -P 2, throughput: 740M
iperf -P 4, throughput: 930M


>
>> The same would be true for the rx descriptors.
>
> Actually it is reasonably feasible to put the rx descriptors
> in cacheable memory and to flush the cache lines after adding
> new entries.
> You just need to add the entries one cache line full at a time
> (and ensure that the rx processing code doesn't dirty the line).
>
> Without cache-coherent memory cached tx descriptors are much harder work.
>
> 	David
>
>
>
>
>

^ permalink raw reply	[flat|nested] 148+ messages in thread

* Re: [PATCH 3/3] net: hisilicon: new hip04 ethernet driver
  2014-04-02  9:21       ` Arnd Bergmann
@ 2014-04-03 15:27         ` Russell King - ARM Linux
  -1 siblings, 0 replies; 148+ messages in thread
From: Russell King - ARM Linux @ 2014-04-03 15:27 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Zhangfei Gao, davem, f.fainelli, sergei.shtylyov, mark.rutland,
	David.Laight, eric.dumazet, linux-arm-kernel, netdev, devicetree

On Wed, Apr 02, 2014 at 11:21:45AM +0200, Arnd Bergmann wrote:
> - As David Laight pointed out earlier, you must also ensure that
>   you don't have too much /data/ pending in the descriptor ring
>   when you stop the queue. For a 10mbit connection, you have already
>   tested (as we discussed on IRC) that 64 descriptors with 1500 byte
>   frames gives you a 68ms round-trip ping time, which is too much.
>   Conversely, on 1gbit, having only 64 descriptors actually seems
>   a little low, and you may be able to get better throughput if
>   you extend the ring to e.g. 512 descriptors.

You don't manage that by stopping the queue - there's separate interfaces
where you report how many bytes you've queued (netdev_sent_queue()) and
how many bytes/packets you've sent (netdev_tx_completed_queue()).  This
allows the netdev schedulers to limit how much data is held in the queue,
preserving interactivity while allowing the advantages of larger rings.

> > +       phys = dma_map_single(&ndev->dev, skb->data, skb->len, DMA_TO_DEVICE);
> > +       if (dma_mapping_error(&ndev->dev, phys)) {
> > +               dev_kfree_skb(skb);
> > +               return NETDEV_TX_OK;
> > +       }
> > +
> > +       priv->tx_skb[tx_head] = skb;
> > +       priv->tx_phys[tx_head] = phys;
> > +       desc->send_addr = cpu_to_be32(phys);
> > +       desc->send_size = cpu_to_be16(skb->len);
> > +       desc->cfg = cpu_to_be32(DESC_DEF_CFG);
> > +       phys = priv->tx_desc_dma + tx_head * sizeof(struct tx_desc);
> > +       desc->wb_addr = cpu_to_be32(phys);
> 
> One detail: since you don't have cache-coherent DMA, "desc" will
> reside in uncached memory, so you try to minimize the number of accesses.
> It's probably faster if you build the descriptor on the stack and
> then atomically copy it over, rather than assigning each member at
> a time.

DMA coherent memory is write combining, so multiple writes will be
coalesced.  This also means that barriers may be required to ensure the
descriptors are pushed out in a timely manner if something like writel()
is not used in the transmit-triggering path.

-- 
FTTC broadband for 0.8mile line: now at 9.7Mbps down 460kbps up... slowly
improving, and getting towards what was expected from it.

^ permalink raw reply	[flat|nested] 148+ messages in thread

* [PATCH 3/3] net: hisilicon: new hip04 ethernet driver
@ 2014-04-03 15:27         ` Russell King - ARM Linux
  0 siblings, 0 replies; 148+ messages in thread
From: Russell King - ARM Linux @ 2014-04-03 15:27 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Apr 02, 2014 at 11:21:45AM +0200, Arnd Bergmann wrote:
> - As David Laight pointed out earlier, you must also ensure that
>   you don't have too much /data/ pending in the descriptor ring
>   when you stop the queue. For a 10mbit connection, you have already
>   tested (as we discussed on IRC) that 64 descriptors with 1500 byte
>   frames gives you a 68ms round-trip ping time, which is too much.
>   Conversely, on 1gbit, having only 64 descriptors actually seems
>   a little low, and you may be able to get better throughput if
>   you extend the ring to e.g. 512 descriptors.

You don't manage that by stopping the queue - there's separate interfaces
where you report how many bytes you've queued (netdev_sent_queue()) and
how many bytes/packets you've sent (netdev_tx_completed_queue()).  This
allows the netdev schedulers to limit how much data is held in the queue,
preserving interactivity while allowing the advantages of larger rings.

> > +       phys = dma_map_single(&ndev->dev, skb->data, skb->len, DMA_TO_DEVICE);
> > +       if (dma_mapping_error(&ndev->dev, phys)) {
> > +               dev_kfree_skb(skb);
> > +               return NETDEV_TX_OK;
> > +       }
> > +
> > +       priv->tx_skb[tx_head] = skb;
> > +       priv->tx_phys[tx_head] = phys;
> > +       desc->send_addr = cpu_to_be32(phys);
> > +       desc->send_size = cpu_to_be16(skb->len);
> > +       desc->cfg = cpu_to_be32(DESC_DEF_CFG);
> > +       phys = priv->tx_desc_dma + tx_head * sizeof(struct tx_desc);
> > +       desc->wb_addr = cpu_to_be32(phys);
> 
> One detail: since you don't have cache-coherent DMA, "desc" will
> reside in uncached memory, so you try to minimize the number of accesses.
> It's probably faster if you build the descriptor on the stack and
> then atomically copy it over, rather than assigning each member at
> a time.

DMA coherent memory is write combining, so multiple writes will be
coalesced.  This also means that barriers may be required to ensure the
descriptors are pushed out in a timely manner if something like writel()
is not used in the transmit-triggering path.

-- 
FTTC broadband for 0.8mile line: now at 9.7Mbps down 460kbps up... slowly
improving, and getting towards what was expected from it.

^ permalink raw reply	[flat|nested] 148+ messages in thread

* Re: [PATCH 3/3] net: hisilicon: new hip04 ethernet driver
  2014-04-02 10:04         ` David Laight
@ 2014-04-03 15:22           ` David Miller
  -1 siblings, 0 replies; 148+ messages in thread
From: David Miller @ 2014-04-03 15:22 UTC (permalink / raw)
  To: David.Laight
  Cc: arnd, zhangfei.gao, linux, f.fainelli, sergei.shtylyov,
	mark.rutland, eric.dumazet, linux-arm-kernel, netdev, devicetree

From: David Laight <David.Laight@ACULAB.COM>
Date: Wed, 2 Apr 2014 10:04:34 +0000

> From: Arnd Bergmann
>> On Tuesday 01 April 2014 21:27:12 Zhangfei Gao wrote:
>> > +static int hip04_mac_start_xmit(struct sk_buff *skb, struct net_device *ndev)
>> 
>> While it looks like there are no serious functionality bugs left, this
>> function is rather inefficient, as has been pointed out before:
>> 
>> > +{
>> > +       struct hip04_priv *priv = netdev_priv(ndev);
>> > +       struct net_device_stats *stats = &ndev->stats;
>> > +       unsigned int tx_head = priv->tx_head;
>> > +       struct tx_desc *desc = &priv->tx_desc[tx_head];
>> > +       dma_addr_t phys;
>> > +
>> > +       hip04_tx_reclaim(ndev, false);
>> > +       mod_timer(&priv->txtimer, jiffies + RECLAIM_PERIOD);
>> > +
>> > +       if (priv->tx_count >= TX_DESC_NUM) {
>> > +               netif_stop_queue(ndev);
>> > +               return NETDEV_TX_BUSY;
>> > +       }
>> 
>> This is where you have two problems:
>> 
>> - if the descriptor ring is full, you wait for RECLAIM_PERIOD,
>>   which is far too long at 500ms, because during that time you
>>   are not able to add further data to the stopped queue.
> 
> Best to have some idea how long it will take for the ring to empty.
> IIRC you need a count of the bytes in the tx ring anyway.
> There isn't much point waking up until most of the queued
> transmits have had time to complete.

There is absolutely no doubt in my mind that you cannot use timers
for this, it simply will never work properly.

There needs to be a real notification from the device of some
sort, and I don't care what form that comes in.

^ permalink raw reply	[flat|nested] 148+ messages in thread

* [PATCH 3/3] net: hisilicon: new hip04 ethernet driver
@ 2014-04-03 15:22           ` David Miller
  0 siblings, 0 replies; 148+ messages in thread
From: David Miller @ 2014-04-03 15:22 UTC (permalink / raw)
  To: linux-arm-kernel

From: David Laight <David.Laight@ACULAB.COM>
Date: Wed, 2 Apr 2014 10:04:34 +0000

> From: Arnd Bergmann
>> On Tuesday 01 April 2014 21:27:12 Zhangfei Gao wrote:
>> > +static int hip04_mac_start_xmit(struct sk_buff *skb, struct net_device *ndev)
>> 
>> While it looks like there are no serious functionality bugs left, this
>> function is rather inefficient, as has been pointed out before:
>> 
>> > +{
>> > +       struct hip04_priv *priv = netdev_priv(ndev);
>> > +       struct net_device_stats *stats = &ndev->stats;
>> > +       unsigned int tx_head = priv->tx_head;
>> > +       struct tx_desc *desc = &priv->tx_desc[tx_head];
>> > +       dma_addr_t phys;
>> > +
>> > +       hip04_tx_reclaim(ndev, false);
>> > +       mod_timer(&priv->txtimer, jiffies + RECLAIM_PERIOD);
>> > +
>> > +       if (priv->tx_count >= TX_DESC_NUM) {
>> > +               netif_stop_queue(ndev);
>> > +               return NETDEV_TX_BUSY;
>> > +       }
>> 
>> This is where you have two problems:
>> 
>> - if the descriptor ring is full, you wait for RECLAIM_PERIOD,
>>   which is far too long at 500ms, because during that time you
>>   are not able to add further data to the stopped queue.
> 
> Best to have some idea how long it will take for the ring to empty.
> IIRC you need a count of the bytes in the tx ring anyway.
> There isn't much point waking up until most of the queued
> transmits have had time to complete.

There is absolutely no doubt in my mind that you cannot use timers
for this, it simply will never work properly.

There needs to be a real notification from the device of some
sort, and I don't care what form that comes in.

^ permalink raw reply	[flat|nested] 148+ messages in thread

* Re: [PATCH 3/3] net: hisilicon: new hip04 ethernet driver
  2014-04-03  6:24             ` Zhangfei Gao
@ 2014-04-03  8:35                 ` Arnd Bergmann
  -1 siblings, 0 replies; 148+ messages in thread
From: Arnd Bergmann @ 2014-04-03  8:35 UTC (permalink / raw)
  To: Zhangfei Gao
  Cc: David Laight, mark.rutland-5wv7dgnIgG8,
	devicetree-u79uwXL29TY76Z2rM5mHXA,
	f.fainelli-Re5JQEeQqe8AvxtiuMwx3w, linux-lFZ/pmaqli7XmaaqVzeoHQ,
	eric.dumazet-Re5JQEeQqe8AvxtiuMwx3w,
	sergei.shtylyov-M4DtvfQ/ZS1MRgGoP+s0PdBPR1lH4CV8,
	netdev-u79uwXL29TY76Z2rM5mHXA, Zhangfei Gao,
	davem-fT/PcQaiUtIeIZ0/mPfg9Q,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r

On Thursday 03 April 2014 14:24:25 Zhangfei Gao wrote:
> On Wed, Apr 2, 2014 at 11:49 PM, Arnd Bergmann <arnd-r2nGTMty4D4@public.gmane.org> wrote:
> > On Wednesday 02 April 2014 10:04:34 David Laight wrote:
> >> What you need to avoid is reads from uncached memory.
> >> It may well beneficial for the tx reclaim code to first
> >> check whether all the transmits have completed (likely)
> >> instead of testing each descriptor in turn.
> >
> > Good point, reading from noncached memory is actually the
> > part that matters. For slow networks (e.g. 10mbit), checking if
> > all of the descriptors have finished is not quite as likely to succeed
> > as for fast (gbit), especially if the timeout is set to expire
> > before all descriptors have completed.
> >
> > If it makes a lot of difference to performance, one could use
> > a binary search over the outstanding descriptors rather than looking
> > just at the last one.
> >
> 
> I am afraid, there may no simple way to check whether all transmits completed.

Why can't you do the trivial change that David suggested above? It
sounds like a three line change to your current code. No need to do
the binary change at first, just see what difference it makes.

> Still want enable the cache coherent feature first.
> Then two benefits:
> 1. dma buffer cacheable.
> 2. descriptor can directly use cacheable memory, so the performance
> concern here may be solved accordingly.
> 
> So how about using this version as first version, while tuning the
> performance in the next step.
> Currently, the gbit interface can reach 420M bits/s in iperf, and the
> 100M interface can reach 94M bits/s.

It sounds like a very simple thing to try and you'd know immediately
if it helps or not.

Besides, you still have to change the other two issues I mentioned
regarding the tx reclaim, so you can do all three at once.

	Arnd
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 148+ messages in thread

* [PATCH 3/3] net: hisilicon: new hip04 ethernet driver
@ 2014-04-03  8:35                 ` Arnd Bergmann
  0 siblings, 0 replies; 148+ messages in thread
From: Arnd Bergmann @ 2014-04-03  8:35 UTC (permalink / raw)
  To: linux-arm-kernel

On Thursday 03 April 2014 14:24:25 Zhangfei Gao wrote:
> On Wed, Apr 2, 2014 at 11:49 PM, Arnd Bergmann <arnd@arndb.de> wrote:
> > On Wednesday 02 April 2014 10:04:34 David Laight wrote:
> >> What you need to avoid is reads from uncached memory.
> >> It may well beneficial for the tx reclaim code to first
> >> check whether all the transmits have completed (likely)
> >> instead of testing each descriptor in turn.
> >
> > Good point, reading from noncached memory is actually the
> > part that matters. For slow networks (e.g. 10mbit), checking if
> > all of the descriptors have finished is not quite as likely to succeed
> > as for fast (gbit), especially if the timeout is set to expire
> > before all descriptors have completed.
> >
> > If it makes a lot of difference to performance, one could use
> > a binary search over the outstanding descriptors rather than looking
> > just at the last one.
> >
> 
> I am afraid, there may no simple way to check whether all transmits completed.

Why can't you do the trivial change that David suggested above? It
sounds like a three line change to your current code. No need to do
the binary change at first, just see what difference it makes.

> Still want enable the cache coherent feature first.
> Then two benefits:
> 1. dma buffer cacheable.
> 2. descriptor can directly use cacheable memory, so the performance
> concern here may be solved accordingly.
> 
> So how about using this version as first version, while tuning the
> performance in the next step.
> Currently, the gbit interface can reach 420M bits/s in iperf, and the
> 100M interface can reach 94M bits/s.

It sounds like a very simple thing to try and you'd know immediately
if it helps or not.

Besides, you still have to change the other two issues I mentioned
regarding the tx reclaim, so you can do all three at once.

	Arnd

^ permalink raw reply	[flat|nested] 148+ messages in thread

* Re: [PATCH 3/3] net: hisilicon: new hip04 ethernet driver
  2014-04-02 15:49           ` Arnd Bergmann
@ 2014-04-03  6:24             ` Zhangfei Gao
  -1 siblings, 0 replies; 148+ messages in thread
From: Zhangfei Gao @ 2014-04-03  6:24 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: David Laight, mark.rutland-5wv7dgnIgG8,
	devicetree-u79uwXL29TY76Z2rM5mHXA,
	f.fainelli-Re5JQEeQqe8AvxtiuMwx3w, linux-lFZ/pmaqli7XmaaqVzeoHQ,
	eric.dumazet-Re5JQEeQqe8AvxtiuMwx3w,
	sergei.shtylyov-M4DtvfQ/ZS1MRgGoP+s0PdBPR1lH4CV8,
	netdev-u79uwXL29TY76Z2rM5mHXA, Zhangfei Gao,
	davem-fT/PcQaiUtIeIZ0/mPfg9Q,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r

On Wed, Apr 2, 2014 at 11:49 PM, Arnd Bergmann <arnd-r2nGTMty4D4@public.gmane.org> wrote:
> On Wednesday 02 April 2014 10:04:34 David Laight wrote:
>> From: Arnd Bergmann
>> > On Tuesday 01 April 2014 21:27:12 Zhangfei Gao wrote:
>> > > +       phys = dma_map_single(&ndev->dev, skb->data, skb->len, DMA_TO_DEVICE);
>> > > +       if (dma_mapping_error(&ndev->dev, phys)) {
>> > > +               dev_kfree_skb(skb);
>> > > +               return NETDEV_TX_OK;
>> > > +       }
>> > > +
>> > > +       priv->tx_skb[tx_head] = skb;
>> > > +       priv->tx_phys[tx_head] = phys;
>> > > +       desc->send_addr = cpu_to_be32(phys);
>> > > +       desc->send_size = cpu_to_be16(skb->len);
>> > > +       desc->cfg = cpu_to_be32(DESC_DEF_CFG);
>> > > +       phys = priv->tx_desc_dma + tx_head * sizeof(struct tx_desc);
>> > > +       desc->wb_addr = cpu_to_be32(phys);
>> >
>> > One detail: since you don't have cache-coherent DMA, "desc" will
>> > reside in uncached memory, so you try to minimize the number of accesses.
>> > It's probably faster if you build the descriptor on the stack and
>> > then atomically copy it over, rather than assigning each member at
>> > a time.
>>
>> I'm not sure, the writes to uncached memory will probably be
>> asynchronous, but you may avoid a stall by separating the
>> cycles in time.
>
> Right.
>
>> What you need to avoid is reads from uncached memory.
>> It may well beneficial for the tx reclaim code to first
>> check whether all the transmits have completed (likely)
>> instead of testing each descriptor in turn.
>
> Good point, reading from noncached memory is actually the
> part that matters. For slow networks (e.g. 10mbit), checking if
> all of the descriptors have finished is not quite as likely to succeed
> as for fast (gbit), especially if the timeout is set to expire
> before all descriptors have completed.
>
> If it makes a lot of difference to performance, one could use
> a binary search over the outstanding descriptors rather than looking
> just at the last one.
>

I am afraid, there may no simple way to check whether all transmits completed.

Still want enable the cache coherent feature first.
Then two benefits:
1. dma buffer cacheable.
2. descriptor can directly use cacheable memory, so the performance
concern here may be solved accordingly.

So how about using this version as first version, while tuning the
performance in the next step.
Currently, the gbit interface can reach 420M bits/s in iperf, and the
100M interface can reach 94M bits/s.

Thanks
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 148+ messages in thread

* [PATCH 3/3] net: hisilicon: new hip04 ethernet driver
@ 2014-04-03  6:24             ` Zhangfei Gao
  0 siblings, 0 replies; 148+ messages in thread
From: Zhangfei Gao @ 2014-04-03  6:24 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Apr 2, 2014 at 11:49 PM, Arnd Bergmann <arnd@arndb.de> wrote:
> On Wednesday 02 April 2014 10:04:34 David Laight wrote:
>> From: Arnd Bergmann
>> > On Tuesday 01 April 2014 21:27:12 Zhangfei Gao wrote:
>> > > +       phys = dma_map_single(&ndev->dev, skb->data, skb->len, DMA_TO_DEVICE);
>> > > +       if (dma_mapping_error(&ndev->dev, phys)) {
>> > > +               dev_kfree_skb(skb);
>> > > +               return NETDEV_TX_OK;
>> > > +       }
>> > > +
>> > > +       priv->tx_skb[tx_head] = skb;
>> > > +       priv->tx_phys[tx_head] = phys;
>> > > +       desc->send_addr = cpu_to_be32(phys);
>> > > +       desc->send_size = cpu_to_be16(skb->len);
>> > > +       desc->cfg = cpu_to_be32(DESC_DEF_CFG);
>> > > +       phys = priv->tx_desc_dma + tx_head * sizeof(struct tx_desc);
>> > > +       desc->wb_addr = cpu_to_be32(phys);
>> >
>> > One detail: since you don't have cache-coherent DMA, "desc" will
>> > reside in uncached memory, so you try to minimize the number of accesses.
>> > It's probably faster if you build the descriptor on the stack and
>> > then atomically copy it over, rather than assigning each member at
>> > a time.
>>
>> I'm not sure, the writes to uncached memory will probably be
>> asynchronous, but you may avoid a stall by separating the
>> cycles in time.
>
> Right.
>
>> What you need to avoid is reads from uncached memory.
>> It may well beneficial for the tx reclaim code to first
>> check whether all the transmits have completed (likely)
>> instead of testing each descriptor in turn.
>
> Good point, reading from noncached memory is actually the
> part that matters. For slow networks (e.g. 10mbit), checking if
> all of the descriptors have finished is not quite as likely to succeed
> as for fast (gbit), especially if the timeout is set to expire
> before all descriptors have completed.
>
> If it makes a lot of difference to performance, one could use
> a binary search over the outstanding descriptors rather than looking
> just at the last one.
>

I am afraid, there may no simple way to check whether all transmits completed.

Still want enable the cache coherent feature first.
Then two benefits:
1. dma buffer cacheable.
2. descriptor can directly use cacheable memory, so the performance
concern here may be solved accordingly.

So how about using this version as first version, while tuning the
performance in the next step.
Currently, the gbit interface can reach 420M bits/s in iperf, and the
100M interface can reach 94M bits/s.

Thanks

^ permalink raw reply	[flat|nested] 148+ messages in thread

* Re: [PATCH 3/3] net: hisilicon: new hip04 ethernet driver
  2014-04-02 10:04         ` David Laight
@ 2014-04-02 15:49           ` Arnd Bergmann
  -1 siblings, 0 replies; 148+ messages in thread
From: Arnd Bergmann @ 2014-04-02 15:49 UTC (permalink / raw)
  To: David Laight
  Cc: Zhangfei Gao, davem, linux, f.fainelli, sergei.shtylyov,
	mark.rutland, eric.dumazet, linux-arm-kernel, netdev, devicetree

On Wednesday 02 April 2014 10:04:34 David Laight wrote:
> From: Arnd Bergmann
> > On Tuesday 01 April 2014 21:27:12 Zhangfei Gao wrote:
> > > +       phys = dma_map_single(&ndev->dev, skb->data, skb->len, DMA_TO_DEVICE);
> > > +       if (dma_mapping_error(&ndev->dev, phys)) {
> > > +               dev_kfree_skb(skb);
> > > +               return NETDEV_TX_OK;
> > > +       }
> > > +
> > > +       priv->tx_skb[tx_head] = skb;
> > > +       priv->tx_phys[tx_head] = phys;
> > > +       desc->send_addr = cpu_to_be32(phys);
> > > +       desc->send_size = cpu_to_be16(skb->len);
> > > +       desc->cfg = cpu_to_be32(DESC_DEF_CFG);
> > > +       phys = priv->tx_desc_dma + tx_head * sizeof(struct tx_desc);
> > > +       desc->wb_addr = cpu_to_be32(phys);
> > 
> > One detail: since you don't have cache-coherent DMA, "desc" will
> > reside in uncached memory, so you try to minimize the number of accesses.
> > It's probably faster if you build the descriptor on the stack and
> > then atomically copy it over, rather than assigning each member at
> > a time.
> 
> I'm not sure, the writes to uncached memory will probably be
> asynchronous, but you may avoid a stall by separating the
> cycles in time.

Right.

> What you need to avoid is reads from uncached memory.
> It may well beneficial for the tx reclaim code to first
> check whether all the transmits have completed (likely)
> instead of testing each descriptor in turn.

Good point, reading from noncached memory is actually the
part that matters. For slow networks (e.g. 10mbit), checking if
all of the descriptors have finished is not quite as likely to succeed
as for fast (gbit), especially if the timeout is set to expire
before all descriptors have completed.

If it makes a lot of difference to performance, one could use
a binary search over the outstanding descriptors rather than looking
just at the last one.

> > The same would be true for the rx descriptors.
> 
> Actually it is reasonably feasible to put the rx descriptors
> in cacheable memory and to flush the cache lines after adding
> new entries.
> You just need to add the entries one cache line full at a time
> (and ensure that the rx processing code doesn't dirty the line).

rx descriptors are already using the streaming mapping, so ignore
what I said about them.

	Arnd

^ permalink raw reply	[flat|nested] 148+ messages in thread

* [PATCH 3/3] net: hisilicon: new hip04 ethernet driver
@ 2014-04-02 15:49           ` Arnd Bergmann
  0 siblings, 0 replies; 148+ messages in thread
From: Arnd Bergmann @ 2014-04-02 15:49 UTC (permalink / raw)
  To: linux-arm-kernel

On Wednesday 02 April 2014 10:04:34 David Laight wrote:
> From: Arnd Bergmann
> > On Tuesday 01 April 2014 21:27:12 Zhangfei Gao wrote:
> > > +       phys = dma_map_single(&ndev->dev, skb->data, skb->len, DMA_TO_DEVICE);
> > > +       if (dma_mapping_error(&ndev->dev, phys)) {
> > > +               dev_kfree_skb(skb);
> > > +               return NETDEV_TX_OK;
> > > +       }
> > > +
> > > +       priv->tx_skb[tx_head] = skb;
> > > +       priv->tx_phys[tx_head] = phys;
> > > +       desc->send_addr = cpu_to_be32(phys);
> > > +       desc->send_size = cpu_to_be16(skb->len);
> > > +       desc->cfg = cpu_to_be32(DESC_DEF_CFG);
> > > +       phys = priv->tx_desc_dma + tx_head * sizeof(struct tx_desc);
> > > +       desc->wb_addr = cpu_to_be32(phys);
> > 
> > One detail: since you don't have cache-coherent DMA, "desc" will
> > reside in uncached memory, so you try to minimize the number of accesses.
> > It's probably faster if you build the descriptor on the stack and
> > then atomically copy it over, rather than assigning each member at
> > a time.
> 
> I'm not sure, the writes to uncached memory will probably be
> asynchronous, but you may avoid a stall by separating the
> cycles in time.

Right.

> What you need to avoid is reads from uncached memory.
> It may well beneficial for the tx reclaim code to first
> check whether all the transmits have completed (likely)
> instead of testing each descriptor in turn.

Good point, reading from noncached memory is actually the
part that matters. For slow networks (e.g. 10mbit), checking if
all of the descriptors have finished is not quite as likely to succeed
as for fast (gbit), especially if the timeout is set to expire
before all descriptors have completed.

If it makes a lot of difference to performance, one could use
a binary search over the outstanding descriptors rather than looking
just at the last one.

> > The same would be true for the rx descriptors.
> 
> Actually it is reasonably feasible to put the rx descriptors
> in cacheable memory and to flush the cache lines after adding
> new entries.
> You just need to add the entries one cache line full at a time
> (and ensure that the rx processing code doesn't dirty the line).

rx descriptors are already using the streaming mapping, so ignore
what I said about them.

	Arnd

^ permalink raw reply	[flat|nested] 148+ messages in thread

* Re: [PATCH 3/3] net: hisilicon: new hip04 ethernet driver
  2014-04-02  9:51         ` zhangfei
@ 2014-04-02 15:24           ` Arnd Bergmann
  -1 siblings, 0 replies; 148+ messages in thread
From: Arnd Bergmann @ 2014-04-02 15:24 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: zhangfei, mark.rutland, devicetree, f.fainelli, linux,
	eric.dumazet, sergei.shtylyov, netdev, David.Laight, davem

On Wednesday 02 April 2014 17:51:54 zhangfei wrote:
> Dear Arnd
> 
> On 04/02/2014 05:21 PM, Arnd Bergmann wrote:
> > On Tuesday 01 April 2014 21:27:12 Zhangfei Gao wrote:
> >> +static int hip04_mac_start_xmit(struct sk_buff *skb, struct net_device *ndev)
> >
> > While it looks like there are no serious functionality bugs left, this
> > function is rather inefficient, as has been pointed out before:
> 
> Yes, still need more performance tuning in the next step.
> We need to enable the hardware feature of cache flush, under help of 
> arm-smmu, as a result dma_map_single etc can be removed.

You cannot remove the dma_map_single call here, but the implementation
of that function will be different when you use the iommu_coherent_ops:
Instead of flushing the caches, it will create or remove an iommu entry
and return the bus address.

I remember you mentioned before that using the iommu on this particular
SoC actually gives you cache-coherent DMA, so you may also be able
to use arm_coherent_dma_ops if you can set up a static 1:1 mapping 
between bus and phys addresses.

> >> +{
> >> +       struct hip04_priv *priv = netdev_priv(ndev);
> >> +       struct net_device_stats *stats = &ndev->stats;
> >> +       unsigned int tx_head = priv->tx_head;
> >> +       struct tx_desc *desc = &priv->tx_desc[tx_head];
> >> +       dma_addr_t phys;
> >> +
> >> +       hip04_tx_reclaim(ndev, false);
> >> +       mod_timer(&priv->txtimer, jiffies + RECLAIM_PERIOD);
> >> +
> >> +       if (priv->tx_count >= TX_DESC_NUM) {
> >> +               netif_stop_queue(ndev);
> >> +               return NETDEV_TX_BUSY;
> >> +       }
> >
> > This is where you have two problems:
> >
> > - if the descriptor ring is full, you wait for RECLAIM_PERIOD,
> >    which is far too long at 500ms, because during that time you
> >    are not able to add further data to the stopped queue.
> 
> Understand
> The idea here is not using the timer as much as possible.
> As experiment shows, only xmit reclaim buffers, the best throughput can 
> be achieved.

I'm only talking about the case where that doesn't work: once you stop
the queue, the xmit function won't get called again until the timer
causes the reclaim be done and restart the queue.

> > - As David Laight pointed out earlier, you must also ensure that
> >    you don't have too much /data/ pending in the descriptor ring
> >    when you stop the queue. For a 10mbit connection, you have already
> >    tested (as we discussed on IRC) that 64 descriptors with 1500 byte
> >    frames gives you a 68ms round-trip ping time, which is too much.
> 
> When iperf & ping running together and only ping, it is 0.7 ms.
> 
> >    Conversely, on 1gbit, having only 64 descriptors actually seems
> >    a little low, and you may be able to get better throughput if
> >    you extend the ring to e.g. 512 descriptors.
> 
> OK, Will check throughput of upgrade xmit descriptors.
> But is it said not using too much descripors for xmit since no xmit 
> interrupt?

The important part is to limit the time that data spends in the queue,
which is a function of the interface tx speed and the number of bytes
in the queue.

> >> +       phys = dma_map_single(&ndev->dev, skb->data, skb->len, DMA_TO_DEVICE);
> >> +       if (dma_mapping_error(&ndev->dev, phys)) {
> >> +               dev_kfree_skb(skb);
> >> +               return NETDEV_TX_OK;
> >> +       }
> >> +
> >> +       priv->tx_skb[tx_head] = skb;
> >> +       priv->tx_phys[tx_head] = phys;
> >> +       desc->send_addr = cpu_to_be32(phys);
> >> +       desc->send_size = cpu_to_be16(skb->len);
> >> +       desc->cfg = cpu_to_be32(DESC_DEF_CFG);
> >> +       phys = priv->tx_desc_dma + tx_head * sizeof(struct tx_desc);
> >> +       desc->wb_addr = cpu_to_be32(phys);
> >
> > One detail: since you don't have cache-coherent DMA, "desc" will
> > reside in uncached memory, so you try to minimize the number of accesses.
> > It's probably faster if you build the descriptor on the stack and
> > then atomically copy it over, rather than assigning each member at
> > a time.
> 
> I am sorry, not quite understand, could you clarify more?
> The phys and size etc of skb->data is changing, so need to assign.
> If member contents keep constant, it can be set when initializing.

I meant you should use 64-bit accesses here instead of multiple 32 and
16 bit accesses, but as David noted, it's actually not that much of
a deal for the writes as it is for the reads from uncached memory.

The important part is to avoid the line where you do 'if (desc->send_addr
!= 0)' as much as possible.

	Arnd

^ permalink raw reply	[flat|nested] 148+ messages in thread

* [PATCH 3/3] net: hisilicon: new hip04 ethernet driver
@ 2014-04-02 15:24           ` Arnd Bergmann
  0 siblings, 0 replies; 148+ messages in thread
From: Arnd Bergmann @ 2014-04-02 15:24 UTC (permalink / raw)
  To: linux-arm-kernel

On Wednesday 02 April 2014 17:51:54 zhangfei wrote:
> Dear Arnd
> 
> On 04/02/2014 05:21 PM, Arnd Bergmann wrote:
> > On Tuesday 01 April 2014 21:27:12 Zhangfei Gao wrote:
> >> +static int hip04_mac_start_xmit(struct sk_buff *skb, struct net_device *ndev)
> >
> > While it looks like there are no serious functionality bugs left, this
> > function is rather inefficient, as has been pointed out before:
> 
> Yes, still need more performance tuning in the next step.
> We need to enable the hardware feature of cache flush, under help of 
> arm-smmu, as a result dma_map_single etc can be removed.

You cannot remove the dma_map_single call here, but the implementation
of that function will be different when you use the iommu_coherent_ops:
Instead of flushing the caches, it will create or remove an iommu entry
and return the bus address.

I remember you mentioned before that using the iommu on this particular
SoC actually gives you cache-coherent DMA, so you may also be able
to use arm_coherent_dma_ops if you can set up a static 1:1 mapping 
between bus and phys addresses.

> >> +{
> >> +       struct hip04_priv *priv = netdev_priv(ndev);
> >> +       struct net_device_stats *stats = &ndev->stats;
> >> +       unsigned int tx_head = priv->tx_head;
> >> +       struct tx_desc *desc = &priv->tx_desc[tx_head];
> >> +       dma_addr_t phys;
> >> +
> >> +       hip04_tx_reclaim(ndev, false);
> >> +       mod_timer(&priv->txtimer, jiffies + RECLAIM_PERIOD);
> >> +
> >> +       if (priv->tx_count >= TX_DESC_NUM) {
> >> +               netif_stop_queue(ndev);
> >> +               return NETDEV_TX_BUSY;
> >> +       }
> >
> > This is where you have two problems:
> >
> > - if the descriptor ring is full, you wait for RECLAIM_PERIOD,
> >    which is far too long at 500ms, because during that time you
> >    are not able to add further data to the stopped queue.
> 
> Understand
> The idea here is not using the timer as much as possible.
> As experiment shows, only xmit reclaim buffers, the best throughput can 
> be achieved.

I'm only talking about the case where that doesn't work: once you stop
the queue, the xmit function won't get called again until the timer
causes the reclaim be done and restart the queue.

> > - As David Laight pointed out earlier, you must also ensure that
> >    you don't have too much /data/ pending in the descriptor ring
> >    when you stop the queue. For a 10mbit connection, you have already
> >    tested (as we discussed on IRC) that 64 descriptors with 1500 byte
> >    frames gives you a 68ms round-trip ping time, which is too much.
> 
> When iperf & ping running together and only ping, it is 0.7 ms.
> 
> >    Conversely, on 1gbit, having only 64 descriptors actually seems
> >    a little low, and you may be able to get better throughput if
> >    you extend the ring to e.g. 512 descriptors.
> 
> OK, Will check throughput of upgrade xmit descriptors.
> But is it said not using too much descripors for xmit since no xmit 
> interrupt?

The important part is to limit the time that data spends in the queue,
which is a function of the interface tx speed and the number of bytes
in the queue.

> >> +       phys = dma_map_single(&ndev->dev, skb->data, skb->len, DMA_TO_DEVICE);
> >> +       if (dma_mapping_error(&ndev->dev, phys)) {
> >> +               dev_kfree_skb(skb);
> >> +               return NETDEV_TX_OK;
> >> +       }
> >> +
> >> +       priv->tx_skb[tx_head] = skb;
> >> +       priv->tx_phys[tx_head] = phys;
> >> +       desc->send_addr = cpu_to_be32(phys);
> >> +       desc->send_size = cpu_to_be16(skb->len);
> >> +       desc->cfg = cpu_to_be32(DESC_DEF_CFG);
> >> +       phys = priv->tx_desc_dma + tx_head * sizeof(struct tx_desc);
> >> +       desc->wb_addr = cpu_to_be32(phys);
> >
> > One detail: since you don't have cache-coherent DMA, "desc" will
> > reside in uncached memory, so you try to minimize the number of accesses.
> > It's probably faster if you build the descriptor on the stack and
> > then atomically copy it over, rather than assigning each member at
> > a time.
> 
> I am sorry, not quite understand, could you clarify more?
> The phys and size etc of skb->data is changing, so need to assign.
> If member contents keep constant, it can be set when initializing.

I meant you should use 64-bit accesses here instead of multiple 32 and
16 bit accesses, but as David noted, it's actually not that much of
a deal for the writes as it is for the reads from uncached memory.

The important part is to avoid the line where you do 'if (desc->send_addr
!= 0)' as much as possible.

	Arnd

^ permalink raw reply	[flat|nested] 148+ messages in thread

* RE: [PATCH 3/3] net: hisilicon: new hip04 ethernet driver
  2014-04-02  9:21       ` Arnd Bergmann
@ 2014-04-02 10:04         ` David Laight
  -1 siblings, 0 replies; 148+ messages in thread
From: David Laight @ 2014-04-02 10:04 UTC (permalink / raw)
  To: 'Arnd Bergmann', Zhangfei Gao
  Cc: davem-fT/PcQaiUtIeIZ0/mPfg9Q, linux-lFZ/pmaqli7XmaaqVzeoHQ,
	f.fainelli-Re5JQEeQqe8AvxtiuMwx3w,
	sergei.shtylyov-M4DtvfQ/ZS1MRgGoP+s0PdBPR1lH4CV8,
	mark.rutland-5wv7dgnIgG8, eric.dumazet-Re5JQEeQqe8AvxtiuMwx3w,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	netdev-u79uwXL29TY76Z2rM5mHXA, devicetree-u79uwXL29TY76Z2rM5mHXA

From: Arnd Bergmann
> On Tuesday 01 April 2014 21:27:12 Zhangfei Gao wrote:
> > +static int hip04_mac_start_xmit(struct sk_buff *skb, struct net_device *ndev)
> 
> While it looks like there are no serious functionality bugs left, this
> function is rather inefficient, as has been pointed out before:
> 
> > +{
> > +       struct hip04_priv *priv = netdev_priv(ndev);
> > +       struct net_device_stats *stats = &ndev->stats;
> > +       unsigned int tx_head = priv->tx_head;
> > +       struct tx_desc *desc = &priv->tx_desc[tx_head];
> > +       dma_addr_t phys;
> > +
> > +       hip04_tx_reclaim(ndev, false);
> > +       mod_timer(&priv->txtimer, jiffies + RECLAIM_PERIOD);
> > +
> > +       if (priv->tx_count >= TX_DESC_NUM) {
> > +               netif_stop_queue(ndev);
> > +               return NETDEV_TX_BUSY;
> > +       }
> 
> This is where you have two problems:
> 
> - if the descriptor ring is full, you wait for RECLAIM_PERIOD,
>   which is far too long at 500ms, because during that time you
>   are not able to add further data to the stopped queue.

Best to have some idea how long it will take for the ring to empty.
IIRC you need a count of the bytes in the tx ring anyway.
There isn't much point waking up until most of the queued
transmits have had time to complete.

> - As David Laight pointed out earlier, you must also ensure that
>   you don't have too much /data/ pending in the descriptor ring
>   when you stop the queue. For a 10mbit connection, you have already
>   tested (as we discussed on IRC) that 64 descriptors with 1500 byte
>   frames gives you a 68ms round-trip ping time, which is too much.
>   Conversely, on 1gbit, having only 64 descriptors actually seems
>   a little low, and you may be able to get better throughput if
>   you extend the ring to e.g. 512 descriptors.

The descriptor count matters most for small packets.
There are workloads (I've got one) that can send 1000s of small packets/sec
on a single TCP connection (there will be receive traffic).

> > +       phys = dma_map_single(&ndev->dev, skb->data, skb->len, DMA_TO_DEVICE);
> > +       if (dma_mapping_error(&ndev->dev, phys)) {
> > +               dev_kfree_skb(skb);
> > +               return NETDEV_TX_OK;
> > +       }
> > +
> > +       priv->tx_skb[tx_head] = skb;
> > +       priv->tx_phys[tx_head] = phys;
> > +       desc->send_addr = cpu_to_be32(phys);
> > +       desc->send_size = cpu_to_be16(skb->len);
> > +       desc->cfg = cpu_to_be32(DESC_DEF_CFG);
> > +       phys = priv->tx_desc_dma + tx_head * sizeof(struct tx_desc);
> > +       desc->wb_addr = cpu_to_be32(phys);
> 
> One detail: since you don't have cache-coherent DMA, "desc" will
> reside in uncached memory, so you try to minimize the number of accesses.
> It's probably faster if you build the descriptor on the stack and
> then atomically copy it over, rather than assigning each member at
> a time.

I'm not sure, the writes to uncached memory will probably be
asynchronous, but you may avoid a stall by separating the
cycles in time.
What you need to avoid is reads from uncached memory.
It may well beneficial for the tx reclaim code to first
check whether all the transmits have completed (likely)
instead of testing each descriptor in turn.

> The same would be true for the rx descriptors.

Actually it is reasonably feasible to put the rx descriptors
in cacheable memory and to flush the cache lines after adding
new entries.
You just need to add the entries one cache line full at a time
(and ensure that the rx processing code doesn't dirty the line).

Without cache-coherent memory cached tx descriptors are much harder work.

	David





--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 148+ messages in thread

* [PATCH 3/3] net: hisilicon: new hip04 ethernet driver
@ 2014-04-02 10:04         ` David Laight
  0 siblings, 0 replies; 148+ messages in thread
From: David Laight @ 2014-04-02 10:04 UTC (permalink / raw)
  To: linux-arm-kernel

From: Arnd Bergmann
> On Tuesday 01 April 2014 21:27:12 Zhangfei Gao wrote:
> > +static int hip04_mac_start_xmit(struct sk_buff *skb, struct net_device *ndev)
> 
> While it looks like there are no serious functionality bugs left, this
> function is rather inefficient, as has been pointed out before:
> 
> > +{
> > +       struct hip04_priv *priv = netdev_priv(ndev);
> > +       struct net_device_stats *stats = &ndev->stats;
> > +       unsigned int tx_head = priv->tx_head;
> > +       struct tx_desc *desc = &priv->tx_desc[tx_head];
> > +       dma_addr_t phys;
> > +
> > +       hip04_tx_reclaim(ndev, false);
> > +       mod_timer(&priv->txtimer, jiffies + RECLAIM_PERIOD);
> > +
> > +       if (priv->tx_count >= TX_DESC_NUM) {
> > +               netif_stop_queue(ndev);
> > +               return NETDEV_TX_BUSY;
> > +       }
> 
> This is where you have two problems:
> 
> - if the descriptor ring is full, you wait for RECLAIM_PERIOD,
>   which is far too long at 500ms, because during that time you
>   are not able to add further data to the stopped queue.

Best to have some idea how long it will take for the ring to empty.
IIRC you need a count of the bytes in the tx ring anyway.
There isn't much point waking up until most of the queued
transmits have had time to complete.

> - As David Laight pointed out earlier, you must also ensure that
>   you don't have too much /data/ pending in the descriptor ring
>   when you stop the queue. For a 10mbit connection, you have already
>   tested (as we discussed on IRC) that 64 descriptors with 1500 byte
>   frames gives you a 68ms round-trip ping time, which is too much.
>   Conversely, on 1gbit, having only 64 descriptors actually seems
>   a little low, and you may be able to get better throughput if
>   you extend the ring to e.g. 512 descriptors.

The descriptor count matters most for small packets.
There are workloads (I've got one) that can send 1000s of small packets/sec
on a single TCP connection (there will be receive traffic).

> > +       phys = dma_map_single(&ndev->dev, skb->data, skb->len, DMA_TO_DEVICE);
> > +       if (dma_mapping_error(&ndev->dev, phys)) {
> > +               dev_kfree_skb(skb);
> > +               return NETDEV_TX_OK;
> > +       }
> > +
> > +       priv->tx_skb[tx_head] = skb;
> > +       priv->tx_phys[tx_head] = phys;
> > +       desc->send_addr = cpu_to_be32(phys);
> > +       desc->send_size = cpu_to_be16(skb->len);
> > +       desc->cfg = cpu_to_be32(DESC_DEF_CFG);
> > +       phys = priv->tx_desc_dma + tx_head * sizeof(struct tx_desc);
> > +       desc->wb_addr = cpu_to_be32(phys);
> 
> One detail: since you don't have cache-coherent DMA, "desc" will
> reside in uncached memory, so you try to minimize the number of accesses.
> It's probably faster if you build the descriptor on the stack and
> then atomically copy it over, rather than assigning each member at
> a time.

I'm not sure, the writes to uncached memory will probably be
asynchronous, but you may avoid a stall by separating the
cycles in time.
What you need to avoid is reads from uncached memory.
It may well beneficial for the tx reclaim code to first
check whether all the transmits have completed (likely)
instead of testing each descriptor in turn.

> The same would be true for the rx descriptors.

Actually it is reasonably feasible to put the rx descriptors
in cacheable memory and to flush the cache lines after adding
new entries.
You just need to add the entries one cache line full at a time
(and ensure that the rx processing code doesn't dirty the line).

Without cache-coherent memory cached tx descriptors are much harder work.

	David

^ permalink raw reply	[flat|nested] 148+ messages in thread

* Re: [PATCH 3/3] net: hisilicon: new hip04 ethernet driver
  2014-04-02  9:21       ` Arnd Bergmann
@ 2014-04-02  9:51         ` zhangfei
  -1 siblings, 0 replies; 148+ messages in thread
From: zhangfei @ 2014-04-02  9:51 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: davem-fT/PcQaiUtIeIZ0/mPfg9Q, linux-lFZ/pmaqli7XmaaqVzeoHQ,
	f.fainelli-Re5JQEeQqe8AvxtiuMwx3w,
	sergei.shtylyov-M4DtvfQ/ZS1MRgGoP+s0PdBPR1lH4CV8,
	mark.rutland-5wv7dgnIgG8, David.Laight-JxhZ9S5GRejQT0dZR+AlfA,
	eric.dumazet-Re5JQEeQqe8AvxtiuMwx3w,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	netdev-u79uwXL29TY76Z2rM5mHXA, devicetree-u79uwXL29TY76Z2rM5mHXA

Dear Arnd

On 04/02/2014 05:21 PM, Arnd Bergmann wrote:
> On Tuesday 01 April 2014 21:27:12 Zhangfei Gao wrote:
>> +static int hip04_mac_start_xmit(struct sk_buff *skb, struct net_device *ndev)
>
> While it looks like there are no serious functionality bugs left, this
> function is rather inefficient, as has been pointed out before:

Yes, still need more performance tuning in the next step.
We need to enable the hardware feature of cache flush, under help of 
arm-smmu, as a result dma_map_single etc can be removed.

>
>> +{
>> +       struct hip04_priv *priv = netdev_priv(ndev);
>> +       struct net_device_stats *stats = &ndev->stats;
>> +       unsigned int tx_head = priv->tx_head;
>> +       struct tx_desc *desc = &priv->tx_desc[tx_head];
>> +       dma_addr_t phys;
>> +
>> +       hip04_tx_reclaim(ndev, false);
>> +       mod_timer(&priv->txtimer, jiffies + RECLAIM_PERIOD);
>> +
>> +       if (priv->tx_count >= TX_DESC_NUM) {
>> +               netif_stop_queue(ndev);
>> +               return NETDEV_TX_BUSY;
>> +       }
>
> This is where you have two problems:
>
> - if the descriptor ring is full, you wait for RECLAIM_PERIOD,
>    which is far too long at 500ms, because during that time you
>    are not able to add further data to the stopped queue.

Understand
The idea here is not using the timer as much as possible.
As experiment shows, only xmit reclaim buffers, the best throughput can 
be achieved.

>
> - As David Laight pointed out earlier, you must also ensure that
>    you don't have too much /data/ pending in the descriptor ring
>    when you stop the queue. For a 10mbit connection, you have already
>    tested (as we discussed on IRC) that 64 descriptors with 1500 byte
>    frames gives you a 68ms round-trip ping time, which is too much.

When iperf & ping running together and only ping, it is 0.7 ms.

>    Conversely, on 1gbit, having only 64 descriptors actually seems
>    a little low, and you may be able to get better throughput if
>    you extend the ring to e.g. 512 descriptors.

OK, Will check throughput of upgrade xmit descriptors.
But is it said not using too much descripors for xmit since no xmit 
interrupt?

>
>> +       phys = dma_map_single(&ndev->dev, skb->data, skb->len, DMA_TO_DEVICE);
>> +       if (dma_mapping_error(&ndev->dev, phys)) {
>> +               dev_kfree_skb(skb);
>> +               return NETDEV_TX_OK;
>> +       }
>> +
>> +       priv->tx_skb[tx_head] = skb;
>> +       priv->tx_phys[tx_head] = phys;
>> +       desc->send_addr = cpu_to_be32(phys);
>> +       desc->send_size = cpu_to_be16(skb->len);
>> +       desc->cfg = cpu_to_be32(DESC_DEF_CFG);
>> +       phys = priv->tx_desc_dma + tx_head * sizeof(struct tx_desc);
>> +       desc->wb_addr = cpu_to_be32(phys);
>
> One detail: since you don't have cache-coherent DMA, "desc" will
> reside in uncached memory, so you try to minimize the number of accesses.
> It's probably faster if you build the descriptor on the stack and
> then atomically copy it over, rather than assigning each member at
> a time.

I am sorry, not quite understand, could you clarify more?
The phys and size etc of skb->data is changing, so need to assign.
If member contents keep constant, it can be set when initializing.

>
> The same would be true for the rx descriptors.
>

Thanks
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 148+ messages in thread

* [PATCH 3/3] net: hisilicon: new hip04 ethernet driver
@ 2014-04-02  9:51         ` zhangfei
  0 siblings, 0 replies; 148+ messages in thread
From: zhangfei @ 2014-04-02  9:51 UTC (permalink / raw)
  To: linux-arm-kernel

Dear Arnd

On 04/02/2014 05:21 PM, Arnd Bergmann wrote:
> On Tuesday 01 April 2014 21:27:12 Zhangfei Gao wrote:
>> +static int hip04_mac_start_xmit(struct sk_buff *skb, struct net_device *ndev)
>
> While it looks like there are no serious functionality bugs left, this
> function is rather inefficient, as has been pointed out before:

Yes, still need more performance tuning in the next step.
We need to enable the hardware feature of cache flush, under help of 
arm-smmu, as a result dma_map_single etc can be removed.

>
>> +{
>> +       struct hip04_priv *priv = netdev_priv(ndev);
>> +       struct net_device_stats *stats = &ndev->stats;
>> +       unsigned int tx_head = priv->tx_head;
>> +       struct tx_desc *desc = &priv->tx_desc[tx_head];
>> +       dma_addr_t phys;
>> +
>> +       hip04_tx_reclaim(ndev, false);
>> +       mod_timer(&priv->txtimer, jiffies + RECLAIM_PERIOD);
>> +
>> +       if (priv->tx_count >= TX_DESC_NUM) {
>> +               netif_stop_queue(ndev);
>> +               return NETDEV_TX_BUSY;
>> +       }
>
> This is where you have two problems:
>
> - if the descriptor ring is full, you wait for RECLAIM_PERIOD,
>    which is far too long at 500ms, because during that time you
>    are not able to add further data to the stopped queue.

Understand
The idea here is not using the timer as much as possible.
As experiment shows, only xmit reclaim buffers, the best throughput can 
be achieved.

>
> - As David Laight pointed out earlier, you must also ensure that
>    you don't have too much /data/ pending in the descriptor ring
>    when you stop the queue. For a 10mbit connection, you have already
>    tested (as we discussed on IRC) that 64 descriptors with 1500 byte
>    frames gives you a 68ms round-trip ping time, which is too much.

When iperf & ping running together and only ping, it is 0.7 ms.

>    Conversely, on 1gbit, having only 64 descriptors actually seems
>    a little low, and you may be able to get better throughput if
>    you extend the ring to e.g. 512 descriptors.

OK, Will check throughput of upgrade xmit descriptors.
But is it said not using too much descripors for xmit since no xmit 
interrupt?

>
>> +       phys = dma_map_single(&ndev->dev, skb->data, skb->len, DMA_TO_DEVICE);
>> +       if (dma_mapping_error(&ndev->dev, phys)) {
>> +               dev_kfree_skb(skb);
>> +               return NETDEV_TX_OK;
>> +       }
>> +
>> +       priv->tx_skb[tx_head] = skb;
>> +       priv->tx_phys[tx_head] = phys;
>> +       desc->send_addr = cpu_to_be32(phys);
>> +       desc->send_size = cpu_to_be16(skb->len);
>> +       desc->cfg = cpu_to_be32(DESC_DEF_CFG);
>> +       phys = priv->tx_desc_dma + tx_head * sizeof(struct tx_desc);
>> +       desc->wb_addr = cpu_to_be32(phys);
>
> One detail: since you don't have cache-coherent DMA, "desc" will
> reside in uncached memory, so you try to minimize the number of accesses.
> It's probably faster if you build the descriptor on the stack and
> then atomically copy it over, rather than assigning each member at
> a time.

I am sorry, not quite understand, could you clarify more?
The phys and size etc of skb->data is changing, so need to assign.
If member contents keep constant, it can be set when initializing.

>
> The same would be true for the rx descriptors.
>

Thanks

^ permalink raw reply	[flat|nested] 148+ messages in thread

* Re: [PATCH 3/3] net: hisilicon: new hip04 ethernet driver
  2014-04-01 13:27   ` Zhangfei Gao
@ 2014-04-02  9:21       ` Arnd Bergmann
  -1 siblings, 0 replies; 148+ messages in thread
From: Arnd Bergmann @ 2014-04-02  9:21 UTC (permalink / raw)
  To: Zhangfei Gao
  Cc: davem-fT/PcQaiUtIeIZ0/mPfg9Q, linux-lFZ/pmaqli7XmaaqVzeoHQ,
	f.fainelli-Re5JQEeQqe8AvxtiuMwx3w,
	sergei.shtylyov-M4DtvfQ/ZS1MRgGoP+s0PdBPR1lH4CV8,
	mark.rutland-5wv7dgnIgG8, David.Laight-JxhZ9S5GRejQT0dZR+AlfA,
	eric.dumazet-Re5JQEeQqe8AvxtiuMwx3w,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	netdev-u79uwXL29TY76Z2rM5mHXA, devicetree-u79uwXL29TY76Z2rM5mHXA

On Tuesday 01 April 2014 21:27:12 Zhangfei Gao wrote:
> +static int hip04_mac_start_xmit(struct sk_buff *skb, struct net_device *ndev)

While it looks like there are no serious functionality bugs left, this
function is rather inefficient, as has been pointed out before:

> +{
> +       struct hip04_priv *priv = netdev_priv(ndev);
> +       struct net_device_stats *stats = &ndev->stats;
> +       unsigned int tx_head = priv->tx_head;
> +       struct tx_desc *desc = &priv->tx_desc[tx_head];
> +       dma_addr_t phys;
> +
> +       hip04_tx_reclaim(ndev, false);
> +       mod_timer(&priv->txtimer, jiffies + RECLAIM_PERIOD);
> +
> +       if (priv->tx_count >= TX_DESC_NUM) {
> +               netif_stop_queue(ndev);
> +               return NETDEV_TX_BUSY;
> +       }

This is where you have two problems:

- if the descriptor ring is full, you wait for RECLAIM_PERIOD,
  which is far too long at 500ms, because during that time you
  are not able to add further data to the stopped queue.

- As David Laight pointed out earlier, you must also ensure that
  you don't have too much /data/ pending in the descriptor ring
  when you stop the queue. For a 10mbit connection, you have already
  tested (as we discussed on IRC) that 64 descriptors with 1500 byte
  frames gives you a 68ms round-trip ping time, which is too much.
  Conversely, on 1gbit, having only 64 descriptors actually seems
  a little low, and you may be able to get better throughput if
  you extend the ring to e.g. 512 descriptors.

> +       phys = dma_map_single(&ndev->dev, skb->data, skb->len, DMA_TO_DEVICE);
> +       if (dma_mapping_error(&ndev->dev, phys)) {
> +               dev_kfree_skb(skb);
> +               return NETDEV_TX_OK;
> +       }
> +
> +       priv->tx_skb[tx_head] = skb;
> +       priv->tx_phys[tx_head] = phys;
> +       desc->send_addr = cpu_to_be32(phys);
> +       desc->send_size = cpu_to_be16(skb->len);
> +       desc->cfg = cpu_to_be32(DESC_DEF_CFG);
> +       phys = priv->tx_desc_dma + tx_head * sizeof(struct tx_desc);
> +       desc->wb_addr = cpu_to_be32(phys);

One detail: since you don't have cache-coherent DMA, "desc" will
reside in uncached memory, so you try to minimize the number of accesses.
It's probably faster if you build the descriptor on the stack and
then atomically copy it over, rather than assigning each member at
a time.

The same would be true for the rx descriptors.

> +       skb_tx_timestamp(skb);
> +
> +       /* Don't wait up for transmitted skbs to be freed. */
> +       skb_orphan(skb);
> +
> +       hip04_set_xmit_desc(priv, phys);
> +       priv->tx_head = TX_NEXT(tx_head);
> +
> +       stats->tx_bytes += skb->len;
> +       stats->tx_packets++;
> +       priv->tx_count++;
> +
> +       return NETDEV_TX_OK;
> +}

	Arnd
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 148+ messages in thread

* [PATCH 3/3] net: hisilicon: new hip04 ethernet driver
@ 2014-04-02  9:21       ` Arnd Bergmann
  0 siblings, 0 replies; 148+ messages in thread
From: Arnd Bergmann @ 2014-04-02  9:21 UTC (permalink / raw)
  To: linux-arm-kernel

On Tuesday 01 April 2014 21:27:12 Zhangfei Gao wrote:
> +static int hip04_mac_start_xmit(struct sk_buff *skb, struct net_device *ndev)

While it looks like there are no serious functionality bugs left, this
function is rather inefficient, as has been pointed out before:

> +{
> +       struct hip04_priv *priv = netdev_priv(ndev);
> +       struct net_device_stats *stats = &ndev->stats;
> +       unsigned int tx_head = priv->tx_head;
> +       struct tx_desc *desc = &priv->tx_desc[tx_head];
> +       dma_addr_t phys;
> +
> +       hip04_tx_reclaim(ndev, false);
> +       mod_timer(&priv->txtimer, jiffies + RECLAIM_PERIOD);
> +
> +       if (priv->tx_count >= TX_DESC_NUM) {
> +               netif_stop_queue(ndev);
> +               return NETDEV_TX_BUSY;
> +       }

This is where you have two problems:

- if the descriptor ring is full, you wait for RECLAIM_PERIOD,
  which is far too long at 500ms, because during that time you
  are not able to add further data to the stopped queue.

- As David Laight pointed out earlier, you must also ensure that
  you don't have too much /data/ pending in the descriptor ring
  when you stop the queue. For a 10mbit connection, you have already
  tested (as we discussed on IRC) that 64 descriptors with 1500 byte
  frames gives you a 68ms round-trip ping time, which is too much.
  Conversely, on 1gbit, having only 64 descriptors actually seems
  a little low, and you may be able to get better throughput if
  you extend the ring to e.g. 512 descriptors.

> +       phys = dma_map_single(&ndev->dev, skb->data, skb->len, DMA_TO_DEVICE);
> +       if (dma_mapping_error(&ndev->dev, phys)) {
> +               dev_kfree_skb(skb);
> +               return NETDEV_TX_OK;
> +       }
> +
> +       priv->tx_skb[tx_head] = skb;
> +       priv->tx_phys[tx_head] = phys;
> +       desc->send_addr = cpu_to_be32(phys);
> +       desc->send_size = cpu_to_be16(skb->len);
> +       desc->cfg = cpu_to_be32(DESC_DEF_CFG);
> +       phys = priv->tx_desc_dma + tx_head * sizeof(struct tx_desc);
> +       desc->wb_addr = cpu_to_be32(phys);

One detail: since you don't have cache-coherent DMA, "desc" will
reside in uncached memory, so you try to minimize the number of accesses.
It's probably faster if you build the descriptor on the stack and
then atomically copy it over, rather than assigning each member at
a time.

The same would be true for the rx descriptors.

> +       skb_tx_timestamp(skb);
> +
> +       /* Don't wait up for transmitted skbs to be freed. */
> +       skb_orphan(skb);
> +
> +       hip04_set_xmit_desc(priv, phys);
> +       priv->tx_head = TX_NEXT(tx_head);
> +
> +       stats->tx_bytes += skb->len;
> +       stats->tx_packets++;
> +       priv->tx_count++;
> +
> +       return NETDEV_TX_OK;
> +}

	Arnd

^ permalink raw reply	[flat|nested] 148+ messages in thread

* [PATCH 3/3] net: hisilicon: new hip04 ethernet driver
  2014-04-01 13:27 [PATCH v5 0/3] add hisilicon " Zhangfei Gao
@ 2014-04-01 13:27   ` Zhangfei Gao
  0 siblings, 0 replies; 148+ messages in thread
From: Zhangfei Gao @ 2014-04-01 13:27 UTC (permalink / raw)
  To: davem, linux, arnd, f.fainelli, sergei.shtylyov, mark.rutland,
	David.Laight, eric.dumazet
  Cc: linux-arm-kernel, netdev, devicetree, Zhangfei Gao

Support Hisilicon hip04 ethernet driver, including 100M / 1000M controller

Signed-off-by: Zhangfei Gao <zhangfei.gao@linaro.org>
---
 drivers/net/ethernet/hisilicon/Makefile    |    2 +-
 drivers/net/ethernet/hisilicon/hip04_eth.c |  771 ++++++++++++++++++++++++++++
 2 files changed, 772 insertions(+), 1 deletion(-)
 create mode 100644 drivers/net/ethernet/hisilicon/hip04_eth.c

diff --git a/drivers/net/ethernet/hisilicon/Makefile b/drivers/net/ethernet/hisilicon/Makefile
index 1d6eb6e..17dec03 100644
--- a/drivers/net/ethernet/hisilicon/Makefile
+++ b/drivers/net/ethernet/hisilicon/Makefile
@@ -2,4 +2,4 @@
 # Makefile for the HISILICON network device drivers.
 #
 
-obj-$(CONFIG_HIP04_ETH) += hip04_mdio.o
+obj-$(CONFIG_HIP04_ETH) += hip04_mdio.o hip04_eth.o
diff --git a/drivers/net/ethernet/hisilicon/hip04_eth.c b/drivers/net/ethernet/hisilicon/hip04_eth.c
new file mode 100644
index 0000000..db00337
--- /dev/null
+++ b/drivers/net/ethernet/hisilicon/hip04_eth.c
@@ -0,0 +1,771 @@
+
+/* Copyright (c) 2014 Linaro Ltd.
+ * Copyright (c) 2014 Hisilicon Limited.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ */
+
+#include <linux/module.h>
+#include <linux/etherdevice.h>
+#include <linux/platform_device.h>
+#include <linux/interrupt.h>
+#include <linux/of_address.h>
+#include <linux/phy.h>
+#include <linux/of_mdio.h>
+#include <linux/of_net.h>
+#include <linux/mfd/syscon.h>
+#include <linux/regmap.h>
+
+#define PPE_CFG_RX_ADDR			0x100
+#define PPE_CFG_POOL_GRP		0x300
+#define PPE_CFG_RX_BUF_SIZE		0x400
+#define PPE_CFG_RX_FIFO_SIZE		0x500
+#define PPE_CURR_BUF_CNT		0xa200
+
+#define GE_DUPLEX_TYPE			0x8
+#define GE_MAX_FRM_SIZE_REG		0x3c
+#define GE_PORT_MODE			0x40
+#define GE_PORT_EN			0x44
+#define GE_SHORT_RUNTS_THR_REG		0x50
+#define GE_TX_LOCAL_PAGE_REG		0x5c
+#define GE_TRANSMIT_CONTROL_REG		0x60
+#define GE_CF_CRC_STRIP_REG		0x1b0
+#define GE_MODE_CHANGE_EN		0x1b4
+#define GE_RECV_CONTROL_REG		0x1e0
+#define GE_STATION_MAC_ADDRESS		0x210
+#define PPE_CFG_TX_PKT_BD_ADDR		0x420
+#define PPE_CFG_MAX_FRAME_LEN_REG	0x408
+#define PPE_CFG_BUS_CTRL_REG		0x424
+#define PPE_CFG_RX_CTRL_REG		0x428
+#define PPE_CFG_RX_PKT_MODE_REG		0x438
+#define PPE_CFG_QOS_VMID_GEN		0x500
+#define PPE_CFG_RX_PKT_INT		0x538
+#define PPE_INTEN			0x600
+#define PPE_INTSTS			0x608
+#define PPE_RINT			0x604
+#define PPE_CFG_STS_MODE		0x700
+#define PPE_HIS_RX_PKT_CNT		0x804
+
+/* REG_INTERRUPT */
+#define RCV_INT				BIT(10)
+#define RCV_NOBUF			BIT(8)
+#define DEF_INT_MASK			(RCV_INT | RCV_NOBUF)
+
+#define RX_DESC_NUM			64
+#define TX_DESC_NUM			64
+#define TX_NEXT(N)			(((N) + 1) & (TX_DESC_NUM-1))
+#define RX_NEXT(N)			(((N) + 1) & (RX_DESC_NUM-1))
+
+#define GMAC_PPE_RX_PKT_MAX_LEN		379
+#define GMAC_MAX_PKT_LEN		1516
+#define DESC_DEF_CFG			0x14
+#define RX_BUF_SIZE			1600
+#define RX_PKT_ERR			0x3
+#define TX_TIMEOUT			(6 * HZ)
+#define RECLAIM_PERIOD			HZ
+
+#define DRV_NAME			"hip04-ether"
+
+struct tx_desc {
+	u32 send_addr;
+	u16 reserved_16;
+	u16 send_size;
+	u32 reserved_32;
+	u32 cfg;
+	u32 wb_addr;
+} ____cacheline_aligned;
+
+struct rx_desc {
+	u16 reserved_16;
+	u16 pkt_len;
+	u32 reserve1[3];
+	u32 pkt_err;
+	u32 reserve2[4];
+};
+
+struct hip04_priv {
+	void __iomem *base;
+	int phy_mode;
+	int chan;
+	unsigned int port;
+	unsigned int speed;
+	unsigned int duplex;
+	unsigned int reg_inten;
+
+	struct napi_struct napi;
+	struct net_device *ndev;
+
+	struct tx_desc *tx_desc;
+	dma_addr_t tx_desc_dma;
+	struct sk_buff *tx_skb[TX_DESC_NUM];
+	dma_addr_t tx_phys[TX_DESC_NUM];
+	spinlock_t lock;
+	unsigned int tx_head;
+	unsigned int tx_tail;
+	unsigned int tx_count;
+
+	unsigned char *rx_buf[RX_DESC_NUM];
+	dma_addr_t rx_phys[RX_DESC_NUM];
+	unsigned int rx_head;
+	unsigned int rx_buf_size;
+
+	struct device_node *phy_node;
+	struct phy_device *phy;
+	struct regmap *map;
+	struct timer_list txtimer;
+};
+
+static void hip04_config_port(struct net_device *ndev, u32 speed, u32 duplex)
+{
+	struct hip04_priv *priv = netdev_priv(ndev);
+	u32 val;
+
+	priv->speed = speed;
+	priv->duplex = duplex;
+
+	switch (priv->phy_mode) {
+	case PHY_INTERFACE_MODE_SGMII:
+		if (speed == SPEED_1000)
+			val = 8;
+		else if (speed == SPEED_100)
+			val = 7;
+		else
+			val = 6;	/* SPEED_10 */
+		break;
+	case PHY_INTERFACE_MODE_MII:
+		if (speed == SPEED_100)
+			val = 1;
+		else
+			val = 0;	/* SPEED_10 */
+		break;
+	default:
+		netdev_warn(ndev, "not supported mode\n");
+		val = 0;
+		break;
+	}
+	writel_relaxed(val, priv->base + GE_PORT_MODE);
+
+	val = (duplex) ? BIT(0) : 0;
+	writel_relaxed(val, priv->base + GE_DUPLEX_TYPE);
+
+	val = BIT(0);
+	writel_relaxed(val, priv->base + GE_MODE_CHANGE_EN);
+}
+
+static void hip04_reset_ppe(struct hip04_priv *priv)
+{
+	u32 val, tmp;
+
+	do {
+		regmap_read(priv->map, priv->port * 4 + PPE_CURR_BUF_CNT, &val);
+		regmap_read(priv->map, priv->port * 4 + PPE_CFG_RX_ADDR, &tmp);
+	} while (val & 0xfff);
+}
+
+static void hip04_config_fifo(struct hip04_priv *priv)
+{
+	u32 val;
+
+	val = readl_relaxed(priv->base + PPE_CFG_STS_MODE);
+	val |= BIT(12);			/* PPE_HIS_RX_PKT_CNT read clear */
+	writel_relaxed(val, priv->base + PPE_CFG_STS_MODE);
+
+	val = BIT(priv->port);
+	regmap_write(priv->map, priv->port * 4 + PPE_CFG_POOL_GRP, val);
+
+	val = priv->port << 8;
+	val |= BIT(14);
+	writel_relaxed(val, priv->base + PPE_CFG_QOS_VMID_GEN);
+
+	val = RX_BUF_SIZE;
+	regmap_write(priv->map, priv->port * 4 + PPE_CFG_RX_BUF_SIZE, val);
+
+	val = RX_DESC_NUM << 16;	/* depth */
+	val |= BIT(11);			/* seq: first set first ues */
+	val |= RX_DESC_NUM * priv->chan;	/* start_addr */
+	regmap_write(priv->map, priv->port * 4 + PPE_CFG_RX_FIFO_SIZE, val);
+
+	/* pkt store format */
+	val = NET_IP_ALIGN << 11;	/* align */
+	writel_relaxed(val, priv->base + PPE_CFG_RX_CTRL_REG);
+
+	/* following cfg required for 1000M */
+	/* pkt mode */
+	val = BIT(18);			/* align */
+	writel_relaxed(val, priv->base + PPE_CFG_RX_PKT_MODE_REG);
+
+	/* set bus ctrl */
+	val = BIT(14);			/* buffer locally release */
+	val |= BIT(0);			/* big endian */
+	writel_relaxed(val, priv->base + PPE_CFG_BUS_CTRL_REG);
+
+	/* set max pkt len, curtail if exceed */
+	val = GMAC_PPE_RX_PKT_MAX_LEN;	/* max buffer len */
+	writel_relaxed(val, priv->base + PPE_CFG_MAX_FRAME_LEN_REG);
+
+	/* set max len of each pkt */
+	val = GMAC_MAX_PKT_LEN;		/* max buffer len */
+	writel_relaxed(val, priv->base + GE_MAX_FRM_SIZE_REG);
+
+	/* set min len of each pkt */
+	val = 31;			/* min buffer len */
+	writel_relaxed(val, priv->base + GE_SHORT_RUNTS_THR_REG);
+
+	/* tx */
+	val = readl_relaxed(priv->base + GE_TRANSMIT_CONTROL_REG);
+	val |= BIT(5);			/* tx auto neg */
+	val |= BIT(6);			/* tx add crc */
+	val |= BIT(7);			/* tx short pad through */
+	writel_relaxed(val, priv->base + GE_TRANSMIT_CONTROL_REG);
+
+	/* rx crc */
+	val = BIT(0);			/* rx strip crc */
+	writel_relaxed(val, priv->base + GE_CF_CRC_STRIP_REG);
+
+	/* rx */
+	val = readl_relaxed(priv->base + GE_RECV_CONTROL_REG);
+	val |= BIT(3);			/* rx strip pad */
+	val |= BIT(4);			/* run pkt en */
+	writel_relaxed(val, priv->base + GE_RECV_CONTROL_REG);
+
+	/* auto neg control */
+	val = BIT(0);
+	writel_relaxed(val, priv->base + GE_TX_LOCAL_PAGE_REG);
+}
+
+static void hip04_mac_enable(struct net_device *ndev)
+{
+	struct hip04_priv *priv = netdev_priv(ndev);
+	u32 val;
+
+	/* enable tx & rx */
+	val = readl_relaxed(priv->base + GE_PORT_EN);
+	val |= BIT(1);		/* rx */
+	val |= BIT(2);		/* tx */
+	writel_relaxed(val, priv->base + GE_PORT_EN);
+
+	/* clear rx int */
+	val = RCV_INT;
+	writel_relaxed(val, priv->base + PPE_RINT);
+
+	/* config recv int */
+	val = BIT(6);		/* int threshold 1 package */
+	val |= 0x4;		/* recv timeout */
+	writel_relaxed(val, priv->base + PPE_CFG_RX_PKT_INT);
+
+	/* enable interrupt */
+	priv->reg_inten = DEF_INT_MASK;
+	writel_relaxed(priv->reg_inten, priv->base + PPE_INTEN);
+}
+
+static void hip04_mac_disable(struct net_device *ndev)
+{
+	struct hip04_priv *priv = netdev_priv(ndev);
+	u32 val;
+
+	/* disable int */
+	priv->reg_inten &= ~(RCV_INT | RCV_NOBUF);
+	writel_relaxed(priv->reg_inten, priv->base + PPE_INTEN);
+
+	/* disable tx & rx */
+	val = readl_relaxed(priv->base + GE_PORT_EN);
+	val &= ~(BIT(1));	/* rx */
+	val &= ~(BIT(2));	/* tx */
+	writel_relaxed(val, priv->base + GE_PORT_EN);
+}
+
+static void hip04_set_xmit_desc(struct hip04_priv *priv, dma_addr_t phys)
+{
+	writel(phys, priv->base + PPE_CFG_TX_PKT_BD_ADDR);
+}
+
+static void hip04_set_recv_desc(struct hip04_priv *priv, dma_addr_t phys)
+{
+	regmap_write(priv->map, priv->port * 4 + PPE_CFG_RX_ADDR, phys);
+}
+
+static u32 hip04_recv_cnt(struct hip04_priv *priv)
+{
+	return readl(priv->base + PPE_HIS_RX_PKT_CNT);
+}
+
+static void hip04_update_mac_address(struct net_device *ndev)
+{
+	struct hip04_priv *priv = netdev_priv(ndev);
+
+	writel_relaxed(((ndev->dev_addr[0] << 8) | (ndev->dev_addr[1])),
+			priv->base + GE_STATION_MAC_ADDRESS);
+	writel_relaxed(((ndev->dev_addr[2] << 24) | (ndev->dev_addr[3] << 16) |
+			(ndev->dev_addr[4] << 8) | (ndev->dev_addr[5])),
+			priv->base + GE_STATION_MAC_ADDRESS + 4);
+}
+
+static int hip04_set_mac_address(struct net_device *ndev, void *addr)
+{
+	eth_mac_addr(ndev, addr);
+	hip04_update_mac_address(ndev);
+	return 0;
+}
+
+static void hip04_tx_reclaim(struct net_device *ndev, bool force)
+{
+	struct hip04_priv *priv = netdev_priv(ndev);
+	unsigned tx_head = priv->tx_head;
+	unsigned tx_tail = priv->tx_tail;
+	struct tx_desc *desc;
+
+	spin_lock_bh(&priv->lock);
+	while ((tx_tail != tx_head) || (priv->tx_count == TX_DESC_NUM)) {
+		desc = &priv->tx_desc[priv->tx_tail];
+		if (desc->send_addr != 0) {
+			if (force)
+				desc->send_addr = 0;
+			else
+				break;
+		}
+		if (priv->tx_phys[tx_tail]) {
+			dma_unmap_single(&ndev->dev, priv->tx_phys[tx_tail],
+				priv->tx_skb[tx_tail]->len, DMA_TO_DEVICE);
+			priv->tx_phys[tx_tail] = 0;
+		}
+		dev_kfree_skb(priv->tx_skb[tx_tail]);
+		priv->tx_skb[tx_tail] = NULL;
+		tx_tail = TX_NEXT(tx_tail);
+		priv->tx_count--;
+	}
+	priv->tx_tail = tx_tail;
+	spin_unlock_bh(&priv->lock);
+
+	if (priv->tx_count)
+		mod_timer(&priv->txtimer, jiffies + RECLAIM_PERIOD);
+
+	if (unlikely(netif_queue_stopped(ndev)) &&
+		(priv->tx_count < TX_DESC_NUM))
+		netif_wake_queue(ndev);
+}
+
+static void hip04_xmit_timer(unsigned long data)
+{
+	struct net_device *ndev = (void *) data;
+
+	hip04_tx_reclaim(ndev, false);
+}
+
+static int hip04_mac_start_xmit(struct sk_buff *skb, struct net_device *ndev)
+{
+	struct hip04_priv *priv = netdev_priv(ndev);
+	struct net_device_stats *stats = &ndev->stats;
+	unsigned int tx_head = priv->tx_head;
+	struct tx_desc *desc = &priv->tx_desc[tx_head];
+	dma_addr_t phys;
+
+	hip04_tx_reclaim(ndev, false);
+	mod_timer(&priv->txtimer, jiffies + RECLAIM_PERIOD);
+
+	if (priv->tx_count >= TX_DESC_NUM) {
+		netif_stop_queue(ndev);
+		return NETDEV_TX_BUSY;
+	}
+
+	phys = dma_map_single(&ndev->dev, skb->data, skb->len, DMA_TO_DEVICE);
+	if (dma_mapping_error(&ndev->dev, phys)) {
+		dev_kfree_skb(skb);
+		return NETDEV_TX_OK;
+	}
+
+	priv->tx_skb[tx_head] = skb;
+	priv->tx_phys[tx_head] = phys;
+	desc->send_addr = cpu_to_be32(phys);
+	desc->send_size = cpu_to_be16(skb->len);
+	desc->cfg = cpu_to_be32(DESC_DEF_CFG);
+	phys = priv->tx_desc_dma + tx_head * sizeof(struct tx_desc);
+	desc->wb_addr = cpu_to_be32(phys);
+	skb_tx_timestamp(skb);
+
+	/* Don't wait up for transmitted skbs to be freed. */
+	skb_orphan(skb);
+
+	hip04_set_xmit_desc(priv, phys);
+	priv->tx_head = TX_NEXT(tx_head);
+
+	stats->tx_bytes += skb->len;
+	stats->tx_packets++;
+	priv->tx_count++;
+
+	return NETDEV_TX_OK;
+}
+
+static int hip04_rx_poll(struct napi_struct *napi, int budget)
+{
+	struct hip04_priv *priv = container_of(napi, struct hip04_priv, napi);
+	struct net_device *ndev = priv->ndev;
+	struct net_device_stats *stats = &ndev->stats;
+	unsigned int cnt = hip04_recv_cnt(priv);
+	struct rx_desc *desc;
+	struct sk_buff *skb;
+	unsigned char *buf;
+	bool last = false;
+	dma_addr_t phys;
+	int rx = 0;
+	u16 len;
+	u32 err;
+
+	while (cnt && !last) {
+		buf = priv->rx_buf[priv->rx_head];
+		skb = build_skb(buf, priv->rx_buf_size);
+		if (unlikely(!skb))
+			net_dbg_ratelimited("build_skb failed\n");
+
+		dma_unmap_single(&ndev->dev, priv->rx_phys[priv->rx_head],
+				RX_BUF_SIZE, DMA_FROM_DEVICE);
+		priv->rx_phys[priv->rx_head] = 0;
+
+		desc = (struct rx_desc *)skb->data;
+		len = be16_to_cpu(desc->pkt_len);
+		err = be32_to_cpu(desc->pkt_err);
+
+		if (len > RX_BUF_SIZE)
+			len = RX_BUF_SIZE;
+
+		if (0 == len) {
+			dev_kfree_skb_any(skb);
+			last = true;
+		} else if (err & RX_PKT_ERR) {
+			dev_kfree_skb_any(skb);
+			stats->rx_dropped++;
+			stats->rx_errors++;
+		} else {
+			skb_reserve(skb, NET_SKB_PAD + NET_IP_ALIGN);
+			skb_put(skb, len);
+			skb->protocol = eth_type_trans(skb, ndev);
+			napi_gro_receive(&priv->napi, skb);
+			stats->rx_packets++;
+			stats->rx_bytes += len;
+			rx++;
+		}
+
+		buf = netdev_alloc_frag(priv->rx_buf_size);
+		if (!buf)
+			return -ENOMEM;
+		phys = dma_map_single(&ndev->dev, buf,
+				RX_BUF_SIZE, DMA_FROM_DEVICE);
+		if (dma_mapping_error(&ndev->dev, phys))
+			return -EIO;
+		priv->rx_buf[priv->rx_head] = buf;
+		priv->rx_phys[priv->rx_head] = phys;
+		hip04_set_recv_desc(priv, phys);
+
+		priv->rx_head = RX_NEXT(priv->rx_head);
+		if (rx >= budget)
+			break;
+
+		if (--cnt == 0)
+			cnt = hip04_recv_cnt(priv);
+	}
+
+	if (rx < budget) {
+		napi_complete(napi);
+
+		/* enable rx interrupt */
+		priv->reg_inten |= RCV_INT | RCV_NOBUF;
+		writel_relaxed(priv->reg_inten, priv->base + PPE_INTEN);
+	}
+
+	return rx;
+}
+
+static irqreturn_t hip04_mac_interrupt(int irq, void *dev_id)
+{
+	struct net_device *ndev = (struct net_device *) dev_id;
+	struct hip04_priv *priv = netdev_priv(ndev);
+	u32 ists = readl_relaxed(priv->base + PPE_INTSTS);
+
+	writel_relaxed(DEF_INT_MASK, priv->base + PPE_RINT);
+
+	if (ists & (RCV_INT | RCV_NOBUF)) {
+		/* disable rx interrupt */
+		priv->reg_inten &= ~(RCV_INT | RCV_NOBUF);
+		writel_relaxed(priv->reg_inten, priv->base + PPE_INTEN);
+		napi_schedule(&priv->napi);
+	}
+
+	return IRQ_HANDLED;
+}
+
+static void hip04_adjust_link(struct net_device *ndev)
+{
+	struct hip04_priv *priv = netdev_priv(ndev);
+	struct phy_device *phy = priv->phy;
+
+	if ((priv->speed != phy->speed) || (priv->duplex != phy->duplex)) {
+		hip04_config_port(ndev, phy->speed, phy->duplex);
+		phy_print_status(phy);
+	}
+}
+
+static int hip04_mac_open(struct net_device *ndev)
+{
+	struct hip04_priv *priv = netdev_priv(ndev);
+	int i;
+
+	priv->rx_head = 0;
+	priv->tx_head = 0;
+	priv->tx_tail = 0;
+	priv->tx_count = 0;
+
+	hip04_reset_ppe(priv);
+	for (i = 0; i < RX_DESC_NUM; i++) {
+		dma_addr_t phys;
+
+		phys = dma_map_single(&ndev->dev, priv->rx_buf[i],
+				RX_BUF_SIZE, DMA_FROM_DEVICE);
+		if (dma_mapping_error(&ndev->dev, phys))
+			return -EIO;
+
+		priv->rx_phys[i] = phys;
+		hip04_set_recv_desc(priv, phys);
+	}
+
+	if (priv->phy)
+		phy_start(priv->phy);
+
+	netif_start_queue(ndev);
+	hip04_mac_enable(ndev);
+	napi_enable(&priv->napi);
+
+	return 0;
+}
+
+static int hip04_mac_stop(struct net_device *ndev)
+{
+	struct hip04_priv *priv = netdev_priv(ndev);
+	int i;
+
+	napi_disable(&priv->napi);
+	netif_stop_queue(ndev);
+	hip04_mac_disable(ndev);
+	hip04_tx_reclaim(ndev, true);
+	hip04_reset_ppe(priv);
+
+	if (priv->phy)
+		phy_stop(priv->phy);
+
+	for (i = 0; i < RX_DESC_NUM; i++) {
+		if (priv->rx_phys[i]) {
+			dma_unmap_single(&ndev->dev, priv->rx_phys[i],
+					RX_BUF_SIZE, DMA_FROM_DEVICE);
+			priv->rx_phys[i] = 0;
+		}
+	}
+
+	return 0;
+}
+
+static void hip04_timeout(struct net_device *ndev)
+{
+	hip04_mac_stop(ndev);
+	hip04_mac_open(ndev);
+	return;
+}
+
+static struct net_device_ops hip04_netdev_ops = {
+	.ndo_open		= hip04_mac_open,
+	.ndo_stop		= hip04_mac_stop,
+	.ndo_start_xmit		= hip04_mac_start_xmit,
+	.ndo_set_mac_address	= hip04_set_mac_address,
+	.ndo_tx_timeout         = hip04_timeout,
+	.ndo_validate_addr	= eth_validate_addr,
+	.ndo_change_mtu		= eth_change_mtu,
+};
+
+static int hip04_alloc_ring(struct net_device *ndev, struct device *d)
+{
+	struct hip04_priv *priv = netdev_priv(ndev);
+	int i;
+
+	priv->tx_desc = dma_alloc_coherent(d,
+			TX_DESC_NUM * sizeof(struct tx_desc),
+			&priv->tx_desc_dma, GFP_KERNEL);
+	if (!priv->tx_desc)
+		return -ENOMEM;
+
+	priv->rx_buf_size = RX_BUF_SIZE +
+			    SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
+	for (i = 0; i < RX_DESC_NUM; i++) {
+		priv->rx_buf[i] = netdev_alloc_frag(priv->rx_buf_size);
+		if (!priv->rx_buf[i])
+			return -ENOMEM;
+	}
+
+	return 0;
+}
+
+static void hip04_free_ring(struct net_device *ndev, struct device *d)
+{
+	struct hip04_priv *priv = netdev_priv(ndev);
+	int i;
+
+	for (i = 0; i < RX_DESC_NUM; i++)
+		if (priv->rx_buf[i])
+			put_page(virt_to_head_page(priv->rx_buf[i]));
+
+	for (i = 0; i < TX_DESC_NUM; i++)
+		if (priv->tx_skb[i])
+			dev_kfree_skb_any(priv->tx_skb[i]);
+
+	dma_free_coherent(d, TX_DESC_NUM * sizeof(struct tx_desc),
+			priv->tx_desc, priv->tx_desc_dma);
+}
+
+static int hip04_mac_probe(struct platform_device *pdev)
+{
+	struct device *d = &pdev->dev;
+	struct device_node *node = d->of_node;
+	struct of_phandle_args arg;
+	struct net_device *ndev;
+	struct hip04_priv *priv;
+	struct resource *res;
+	unsigned int irq;
+	int ret;
+
+	ndev = alloc_etherdev(sizeof(struct hip04_priv));
+	if (!ndev)
+		return -ENOMEM;
+
+	priv = netdev_priv(ndev);
+	priv->ndev = ndev;
+	platform_set_drvdata(pdev, ndev);
+
+	res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
+	priv->base = devm_ioremap_resource(d, res);
+	if (IS_ERR(priv->base)) {
+		ret = PTR_ERR(priv->base);
+		goto init_fail;
+	}
+
+	ret = of_parse_phandle_with_fixed_args(node, "port-handle", 2, 0, &arg);
+	if (ret < 0) {
+		dev_warn(d, "no port-handle\n");
+		goto init_fail;
+	}
+
+	priv->port = arg.args[0];
+	priv->chan = arg.args[1];
+
+	priv->map = syscon_node_to_regmap(arg.np);
+	if (IS_ERR(priv->map)) {
+		dev_warn(d, "no syscon hisilicon,hip04-ppe\n");
+		ret = PTR_ERR(priv->map);
+		goto init_fail;
+	}
+
+	priv->phy_mode = of_get_phy_mode(node);
+	if (priv->phy_mode < 0) {
+		dev_warn(d, "not find phy-mode\n");
+		ret = -EINVAL;
+		goto init_fail;
+	}
+
+	irq = platform_get_irq(pdev, 0);
+	if (irq <= 0) {
+		ret = -EINVAL;
+		goto init_fail;
+	}
+
+	ret = devm_request_irq(d, irq, hip04_mac_interrupt,
+				0, pdev->name, ndev);
+	if (ret) {
+		netdev_err(ndev, "devm_request_irq failed\n");
+		goto init_fail;
+	}
+
+	priv->phy_node = of_parse_phandle(node, "phy-handle", 0);
+	if (priv->phy_node) {
+		priv->phy = of_phy_connect(ndev, priv->phy_node,
+			&hip04_adjust_link, 0, priv->phy_mode);
+		if (!priv->phy) {
+			ret = -EPROBE_DEFER;
+			goto init_fail;
+		}
+	}
+
+	ether_setup(ndev);
+	ndev->netdev_ops = &hip04_netdev_ops;
+	ndev->watchdog_timeo = TX_TIMEOUT;
+	ndev->priv_flags |= IFF_UNICAST_FLT;
+	ndev->irq = irq;
+	netif_napi_add(ndev, &priv->napi, hip04_rx_poll, RX_DESC_NUM);
+	SET_NETDEV_DEV(ndev, &pdev->dev);
+
+	hip04_reset_ppe(priv);
+	if (priv->phy_mode == PHY_INTERFACE_MODE_MII)
+		hip04_config_port(ndev, SPEED_100, DUPLEX_FULL);
+
+	hip04_config_fifo(priv);
+	random_ether_addr(ndev->dev_addr);
+	hip04_update_mac_address(ndev);
+
+	ret = hip04_alloc_ring(ndev, d);
+	if (ret) {
+		netdev_err(ndev, "alloc ring fail\n");
+		goto alloc_fail;
+	}
+
+	setup_timer(&priv->txtimer, hip04_xmit_timer, (unsigned long) ndev);
+	ret = register_netdev(ndev);
+	if (ret) {
+		free_netdev(ndev);
+		goto alloc_fail;
+	}
+
+	return 0;
+
+alloc_fail:
+	hip04_free_ring(ndev, d);
+init_fail:
+	of_node_put(priv->phy_node);
+	free_netdev(ndev);
+	return ret;
+}
+
+static int hip04_remove(struct platform_device *pdev)
+{
+	struct net_device *ndev = platform_get_drvdata(pdev);
+	struct hip04_priv *priv = netdev_priv(ndev);
+	struct device *d = &pdev->dev;
+
+	if (priv->phy)
+		phy_disconnect(priv->phy);
+
+	del_timer_sync(&priv->txtimer);
+	hip04_free_ring(ndev, d);
+	unregister_netdev(ndev);
+	free_irq(ndev->irq, ndev);
+	of_node_put(priv->phy_node);
+	free_netdev(ndev);
+
+	return 0;
+}
+
+static const struct of_device_id hip04_mac_match[] = {
+	{ .compatible = "hisilicon,hip04-mac" },
+	{ }
+};
+
+static struct platform_driver hip04_mac_driver = {
+	.probe	= hip04_mac_probe,
+	.remove	= hip04_remove,
+	.driver	= {
+		.name		= DRV_NAME,
+		.owner		= THIS_MODULE,
+		.of_match_table	= hip04_mac_match,
+	},
+};
+module_platform_driver(hip04_mac_driver);
+
+MODULE_DESCRIPTION("HISILICON P04 Ethernet driver");
+MODULE_LICENSE("GPL v2");
+MODULE_ALIAS("platform:hip04-ether");
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 148+ messages in thread

* [PATCH 3/3] net: hisilicon: new hip04 ethernet driver
@ 2014-04-01 13:27   ` Zhangfei Gao
  0 siblings, 0 replies; 148+ messages in thread
From: Zhangfei Gao @ 2014-04-01 13:27 UTC (permalink / raw)
  To: linux-arm-kernel

Support Hisilicon hip04 ethernet driver, including 100M / 1000M controller

Signed-off-by: Zhangfei Gao <zhangfei.gao@linaro.org>
---
 drivers/net/ethernet/hisilicon/Makefile    |    2 +-
 drivers/net/ethernet/hisilicon/hip04_eth.c |  771 ++++++++++++++++++++++++++++
 2 files changed, 772 insertions(+), 1 deletion(-)
 create mode 100644 drivers/net/ethernet/hisilicon/hip04_eth.c

diff --git a/drivers/net/ethernet/hisilicon/Makefile b/drivers/net/ethernet/hisilicon/Makefile
index 1d6eb6e..17dec03 100644
--- a/drivers/net/ethernet/hisilicon/Makefile
+++ b/drivers/net/ethernet/hisilicon/Makefile
@@ -2,4 +2,4 @@
 # Makefile for the HISILICON network device drivers.
 #
 
-obj-$(CONFIG_HIP04_ETH) += hip04_mdio.o
+obj-$(CONFIG_HIP04_ETH) += hip04_mdio.o hip04_eth.o
diff --git a/drivers/net/ethernet/hisilicon/hip04_eth.c b/drivers/net/ethernet/hisilicon/hip04_eth.c
new file mode 100644
index 0000000..db00337
--- /dev/null
+++ b/drivers/net/ethernet/hisilicon/hip04_eth.c
@@ -0,0 +1,771 @@
+
+/* Copyright (c) 2014 Linaro Ltd.
+ * Copyright (c) 2014 Hisilicon Limited.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ */
+
+#include <linux/module.h>
+#include <linux/etherdevice.h>
+#include <linux/platform_device.h>
+#include <linux/interrupt.h>
+#include <linux/of_address.h>
+#include <linux/phy.h>
+#include <linux/of_mdio.h>
+#include <linux/of_net.h>
+#include <linux/mfd/syscon.h>
+#include <linux/regmap.h>
+
+#define PPE_CFG_RX_ADDR			0x100
+#define PPE_CFG_POOL_GRP		0x300
+#define PPE_CFG_RX_BUF_SIZE		0x400
+#define PPE_CFG_RX_FIFO_SIZE		0x500
+#define PPE_CURR_BUF_CNT		0xa200
+
+#define GE_DUPLEX_TYPE			0x8
+#define GE_MAX_FRM_SIZE_REG		0x3c
+#define GE_PORT_MODE			0x40
+#define GE_PORT_EN			0x44
+#define GE_SHORT_RUNTS_THR_REG		0x50
+#define GE_TX_LOCAL_PAGE_REG		0x5c
+#define GE_TRANSMIT_CONTROL_REG		0x60
+#define GE_CF_CRC_STRIP_REG		0x1b0
+#define GE_MODE_CHANGE_EN		0x1b4
+#define GE_RECV_CONTROL_REG		0x1e0
+#define GE_STATION_MAC_ADDRESS		0x210
+#define PPE_CFG_TX_PKT_BD_ADDR		0x420
+#define PPE_CFG_MAX_FRAME_LEN_REG	0x408
+#define PPE_CFG_BUS_CTRL_REG		0x424
+#define PPE_CFG_RX_CTRL_REG		0x428
+#define PPE_CFG_RX_PKT_MODE_REG		0x438
+#define PPE_CFG_QOS_VMID_GEN		0x500
+#define PPE_CFG_RX_PKT_INT		0x538
+#define PPE_INTEN			0x600
+#define PPE_INTSTS			0x608
+#define PPE_RINT			0x604
+#define PPE_CFG_STS_MODE		0x700
+#define PPE_HIS_RX_PKT_CNT		0x804
+
+/* REG_INTERRUPT */
+#define RCV_INT				BIT(10)
+#define RCV_NOBUF			BIT(8)
+#define DEF_INT_MASK			(RCV_INT | RCV_NOBUF)
+
+#define RX_DESC_NUM			64
+#define TX_DESC_NUM			64
+#define TX_NEXT(N)			(((N) + 1) & (TX_DESC_NUM-1))
+#define RX_NEXT(N)			(((N) + 1) & (RX_DESC_NUM-1))
+
+#define GMAC_PPE_RX_PKT_MAX_LEN		379
+#define GMAC_MAX_PKT_LEN		1516
+#define DESC_DEF_CFG			0x14
+#define RX_BUF_SIZE			1600
+#define RX_PKT_ERR			0x3
+#define TX_TIMEOUT			(6 * HZ)
+#define RECLAIM_PERIOD			HZ
+
+#define DRV_NAME			"hip04-ether"
+
+struct tx_desc {
+	u32 send_addr;
+	u16 reserved_16;
+	u16 send_size;
+	u32 reserved_32;
+	u32 cfg;
+	u32 wb_addr;
+} ____cacheline_aligned;
+
+struct rx_desc {
+	u16 reserved_16;
+	u16 pkt_len;
+	u32 reserve1[3];
+	u32 pkt_err;
+	u32 reserve2[4];
+};
+
+struct hip04_priv {
+	void __iomem *base;
+	int phy_mode;
+	int chan;
+	unsigned int port;
+	unsigned int speed;
+	unsigned int duplex;
+	unsigned int reg_inten;
+
+	struct napi_struct napi;
+	struct net_device *ndev;
+
+	struct tx_desc *tx_desc;
+	dma_addr_t tx_desc_dma;
+	struct sk_buff *tx_skb[TX_DESC_NUM];
+	dma_addr_t tx_phys[TX_DESC_NUM];
+	spinlock_t lock;
+	unsigned int tx_head;
+	unsigned int tx_tail;
+	unsigned int tx_count;
+
+	unsigned char *rx_buf[RX_DESC_NUM];
+	dma_addr_t rx_phys[RX_DESC_NUM];
+	unsigned int rx_head;
+	unsigned int rx_buf_size;
+
+	struct device_node *phy_node;
+	struct phy_device *phy;
+	struct regmap *map;
+	struct timer_list txtimer;
+};
+
+static void hip04_config_port(struct net_device *ndev, u32 speed, u32 duplex)
+{
+	struct hip04_priv *priv = netdev_priv(ndev);
+	u32 val;
+
+	priv->speed = speed;
+	priv->duplex = duplex;
+
+	switch (priv->phy_mode) {
+	case PHY_INTERFACE_MODE_SGMII:
+		if (speed == SPEED_1000)
+			val = 8;
+		else if (speed == SPEED_100)
+			val = 7;
+		else
+			val = 6;	/* SPEED_10 */
+		break;
+	case PHY_INTERFACE_MODE_MII:
+		if (speed == SPEED_100)
+			val = 1;
+		else
+			val = 0;	/* SPEED_10 */
+		break;
+	default:
+		netdev_warn(ndev, "not supported mode\n");
+		val = 0;
+		break;
+	}
+	writel_relaxed(val, priv->base + GE_PORT_MODE);
+
+	val = (duplex) ? BIT(0) : 0;
+	writel_relaxed(val, priv->base + GE_DUPLEX_TYPE);
+
+	val = BIT(0);
+	writel_relaxed(val, priv->base + GE_MODE_CHANGE_EN);
+}
+
+static void hip04_reset_ppe(struct hip04_priv *priv)
+{
+	u32 val, tmp;
+
+	do {
+		regmap_read(priv->map, priv->port * 4 + PPE_CURR_BUF_CNT, &val);
+		regmap_read(priv->map, priv->port * 4 + PPE_CFG_RX_ADDR, &tmp);
+	} while (val & 0xfff);
+}
+
+static void hip04_config_fifo(struct hip04_priv *priv)
+{
+	u32 val;
+
+	val = readl_relaxed(priv->base + PPE_CFG_STS_MODE);
+	val |= BIT(12);			/* PPE_HIS_RX_PKT_CNT read clear */
+	writel_relaxed(val, priv->base + PPE_CFG_STS_MODE);
+
+	val = BIT(priv->port);
+	regmap_write(priv->map, priv->port * 4 + PPE_CFG_POOL_GRP, val);
+
+	val = priv->port << 8;
+	val |= BIT(14);
+	writel_relaxed(val, priv->base + PPE_CFG_QOS_VMID_GEN);
+
+	val = RX_BUF_SIZE;
+	regmap_write(priv->map, priv->port * 4 + PPE_CFG_RX_BUF_SIZE, val);
+
+	val = RX_DESC_NUM << 16;	/* depth */
+	val |= BIT(11);			/* seq: first set first ues */
+	val |= RX_DESC_NUM * priv->chan;	/* start_addr */
+	regmap_write(priv->map, priv->port * 4 + PPE_CFG_RX_FIFO_SIZE, val);
+
+	/* pkt store format */
+	val = NET_IP_ALIGN << 11;	/* align */
+	writel_relaxed(val, priv->base + PPE_CFG_RX_CTRL_REG);
+
+	/* following cfg required for 1000M */
+	/* pkt mode */
+	val = BIT(18);			/* align */
+	writel_relaxed(val, priv->base + PPE_CFG_RX_PKT_MODE_REG);
+
+	/* set bus ctrl */
+	val = BIT(14);			/* buffer locally release */
+	val |= BIT(0);			/* big endian */
+	writel_relaxed(val, priv->base + PPE_CFG_BUS_CTRL_REG);
+
+	/* set max pkt len, curtail if exceed */
+	val = GMAC_PPE_RX_PKT_MAX_LEN;	/* max buffer len */
+	writel_relaxed(val, priv->base + PPE_CFG_MAX_FRAME_LEN_REG);
+
+	/* set max len of each pkt */
+	val = GMAC_MAX_PKT_LEN;		/* max buffer len */
+	writel_relaxed(val, priv->base + GE_MAX_FRM_SIZE_REG);
+
+	/* set min len of each pkt */
+	val = 31;			/* min buffer len */
+	writel_relaxed(val, priv->base + GE_SHORT_RUNTS_THR_REG);
+
+	/* tx */
+	val = readl_relaxed(priv->base + GE_TRANSMIT_CONTROL_REG);
+	val |= BIT(5);			/* tx auto neg */
+	val |= BIT(6);			/* tx add crc */
+	val |= BIT(7);			/* tx short pad through */
+	writel_relaxed(val, priv->base + GE_TRANSMIT_CONTROL_REG);
+
+	/* rx crc */
+	val = BIT(0);			/* rx strip crc */
+	writel_relaxed(val, priv->base + GE_CF_CRC_STRIP_REG);
+
+	/* rx */
+	val = readl_relaxed(priv->base + GE_RECV_CONTROL_REG);
+	val |= BIT(3);			/* rx strip pad */
+	val |= BIT(4);			/* run pkt en */
+	writel_relaxed(val, priv->base + GE_RECV_CONTROL_REG);
+
+	/* auto neg control */
+	val = BIT(0);
+	writel_relaxed(val, priv->base + GE_TX_LOCAL_PAGE_REG);
+}
+
+static void hip04_mac_enable(struct net_device *ndev)
+{
+	struct hip04_priv *priv = netdev_priv(ndev);
+	u32 val;
+
+	/* enable tx & rx */
+	val = readl_relaxed(priv->base + GE_PORT_EN);
+	val |= BIT(1);		/* rx */
+	val |= BIT(2);		/* tx */
+	writel_relaxed(val, priv->base + GE_PORT_EN);
+
+	/* clear rx int */
+	val = RCV_INT;
+	writel_relaxed(val, priv->base + PPE_RINT);
+
+	/* config recv int */
+	val = BIT(6);		/* int threshold 1 package */
+	val |= 0x4;		/* recv timeout */
+	writel_relaxed(val, priv->base + PPE_CFG_RX_PKT_INT);
+
+	/* enable interrupt */
+	priv->reg_inten = DEF_INT_MASK;
+	writel_relaxed(priv->reg_inten, priv->base + PPE_INTEN);
+}
+
+static void hip04_mac_disable(struct net_device *ndev)
+{
+	struct hip04_priv *priv = netdev_priv(ndev);
+	u32 val;
+
+	/* disable int */
+	priv->reg_inten &= ~(RCV_INT | RCV_NOBUF);
+	writel_relaxed(priv->reg_inten, priv->base + PPE_INTEN);
+
+	/* disable tx & rx */
+	val = readl_relaxed(priv->base + GE_PORT_EN);
+	val &= ~(BIT(1));	/* rx */
+	val &= ~(BIT(2));	/* tx */
+	writel_relaxed(val, priv->base + GE_PORT_EN);
+}
+
+static void hip04_set_xmit_desc(struct hip04_priv *priv, dma_addr_t phys)
+{
+	writel(phys, priv->base + PPE_CFG_TX_PKT_BD_ADDR);
+}
+
+static void hip04_set_recv_desc(struct hip04_priv *priv, dma_addr_t phys)
+{
+	regmap_write(priv->map, priv->port * 4 + PPE_CFG_RX_ADDR, phys);
+}
+
+static u32 hip04_recv_cnt(struct hip04_priv *priv)
+{
+	return readl(priv->base + PPE_HIS_RX_PKT_CNT);
+}
+
+static void hip04_update_mac_address(struct net_device *ndev)
+{
+	struct hip04_priv *priv = netdev_priv(ndev);
+
+	writel_relaxed(((ndev->dev_addr[0] << 8) | (ndev->dev_addr[1])),
+			priv->base + GE_STATION_MAC_ADDRESS);
+	writel_relaxed(((ndev->dev_addr[2] << 24) | (ndev->dev_addr[3] << 16) |
+			(ndev->dev_addr[4] << 8) | (ndev->dev_addr[5])),
+			priv->base + GE_STATION_MAC_ADDRESS + 4);
+}
+
+static int hip04_set_mac_address(struct net_device *ndev, void *addr)
+{
+	eth_mac_addr(ndev, addr);
+	hip04_update_mac_address(ndev);
+	return 0;
+}
+
+static void hip04_tx_reclaim(struct net_device *ndev, bool force)
+{
+	struct hip04_priv *priv = netdev_priv(ndev);
+	unsigned tx_head = priv->tx_head;
+	unsigned tx_tail = priv->tx_tail;
+	struct tx_desc *desc;
+
+	spin_lock_bh(&priv->lock);
+	while ((tx_tail != tx_head) || (priv->tx_count == TX_DESC_NUM)) {
+		desc = &priv->tx_desc[priv->tx_tail];
+		if (desc->send_addr != 0) {
+			if (force)
+				desc->send_addr = 0;
+			else
+				break;
+		}
+		if (priv->tx_phys[tx_tail]) {
+			dma_unmap_single(&ndev->dev, priv->tx_phys[tx_tail],
+				priv->tx_skb[tx_tail]->len, DMA_TO_DEVICE);
+			priv->tx_phys[tx_tail] = 0;
+		}
+		dev_kfree_skb(priv->tx_skb[tx_tail]);
+		priv->tx_skb[tx_tail] = NULL;
+		tx_tail = TX_NEXT(tx_tail);
+		priv->tx_count--;
+	}
+	priv->tx_tail = tx_tail;
+	spin_unlock_bh(&priv->lock);
+
+	if (priv->tx_count)
+		mod_timer(&priv->txtimer, jiffies + RECLAIM_PERIOD);
+
+	if (unlikely(netif_queue_stopped(ndev)) &&
+		(priv->tx_count < TX_DESC_NUM))
+		netif_wake_queue(ndev);
+}
+
+static void hip04_xmit_timer(unsigned long data)
+{
+	struct net_device *ndev = (void *) data;
+
+	hip04_tx_reclaim(ndev, false);
+}
+
+static int hip04_mac_start_xmit(struct sk_buff *skb, struct net_device *ndev)
+{
+	struct hip04_priv *priv = netdev_priv(ndev);
+	struct net_device_stats *stats = &ndev->stats;
+	unsigned int tx_head = priv->tx_head;
+	struct tx_desc *desc = &priv->tx_desc[tx_head];
+	dma_addr_t phys;
+
+	hip04_tx_reclaim(ndev, false);
+	mod_timer(&priv->txtimer, jiffies + RECLAIM_PERIOD);
+
+	if (priv->tx_count >= TX_DESC_NUM) {
+		netif_stop_queue(ndev);
+		return NETDEV_TX_BUSY;
+	}
+
+	phys = dma_map_single(&ndev->dev, skb->data, skb->len, DMA_TO_DEVICE);
+	if (dma_mapping_error(&ndev->dev, phys)) {
+		dev_kfree_skb(skb);
+		return NETDEV_TX_OK;
+	}
+
+	priv->tx_skb[tx_head] = skb;
+	priv->tx_phys[tx_head] = phys;
+	desc->send_addr = cpu_to_be32(phys);
+	desc->send_size = cpu_to_be16(skb->len);
+	desc->cfg = cpu_to_be32(DESC_DEF_CFG);
+	phys = priv->tx_desc_dma + tx_head * sizeof(struct tx_desc);
+	desc->wb_addr = cpu_to_be32(phys);
+	skb_tx_timestamp(skb);
+
+	/* Don't wait up for transmitted skbs to be freed. */
+	skb_orphan(skb);
+
+	hip04_set_xmit_desc(priv, phys);
+	priv->tx_head = TX_NEXT(tx_head);
+
+	stats->tx_bytes += skb->len;
+	stats->tx_packets++;
+	priv->tx_count++;
+
+	return NETDEV_TX_OK;
+}
+
+static int hip04_rx_poll(struct napi_struct *napi, int budget)
+{
+	struct hip04_priv *priv = container_of(napi, struct hip04_priv, napi);
+	struct net_device *ndev = priv->ndev;
+	struct net_device_stats *stats = &ndev->stats;
+	unsigned int cnt = hip04_recv_cnt(priv);
+	struct rx_desc *desc;
+	struct sk_buff *skb;
+	unsigned char *buf;
+	bool last = false;
+	dma_addr_t phys;
+	int rx = 0;
+	u16 len;
+	u32 err;
+
+	while (cnt && !last) {
+		buf = priv->rx_buf[priv->rx_head];
+		skb = build_skb(buf, priv->rx_buf_size);
+		if (unlikely(!skb))
+			net_dbg_ratelimited("build_skb failed\n");
+
+		dma_unmap_single(&ndev->dev, priv->rx_phys[priv->rx_head],
+				RX_BUF_SIZE, DMA_FROM_DEVICE);
+		priv->rx_phys[priv->rx_head] = 0;
+
+		desc = (struct rx_desc *)skb->data;
+		len = be16_to_cpu(desc->pkt_len);
+		err = be32_to_cpu(desc->pkt_err);
+
+		if (len > RX_BUF_SIZE)
+			len = RX_BUF_SIZE;
+
+		if (0 == len) {
+			dev_kfree_skb_any(skb);
+			last = true;
+		} else if (err & RX_PKT_ERR) {
+			dev_kfree_skb_any(skb);
+			stats->rx_dropped++;
+			stats->rx_errors++;
+		} else {
+			skb_reserve(skb, NET_SKB_PAD + NET_IP_ALIGN);
+			skb_put(skb, len);
+			skb->protocol = eth_type_trans(skb, ndev);
+			napi_gro_receive(&priv->napi, skb);
+			stats->rx_packets++;
+			stats->rx_bytes += len;
+			rx++;
+		}
+
+		buf = netdev_alloc_frag(priv->rx_buf_size);
+		if (!buf)
+			return -ENOMEM;
+		phys = dma_map_single(&ndev->dev, buf,
+				RX_BUF_SIZE, DMA_FROM_DEVICE);
+		if (dma_mapping_error(&ndev->dev, phys))
+			return -EIO;
+		priv->rx_buf[priv->rx_head] = buf;
+		priv->rx_phys[priv->rx_head] = phys;
+		hip04_set_recv_desc(priv, phys);
+
+		priv->rx_head = RX_NEXT(priv->rx_head);
+		if (rx >= budget)
+			break;
+
+		if (--cnt == 0)
+			cnt = hip04_recv_cnt(priv);
+	}
+
+	if (rx < budget) {
+		napi_complete(napi);
+
+		/* enable rx interrupt */
+		priv->reg_inten |= RCV_INT | RCV_NOBUF;
+		writel_relaxed(priv->reg_inten, priv->base + PPE_INTEN);
+	}
+
+	return rx;
+}
+
+static irqreturn_t hip04_mac_interrupt(int irq, void *dev_id)
+{
+	struct net_device *ndev = (struct net_device *) dev_id;
+	struct hip04_priv *priv = netdev_priv(ndev);
+	u32 ists = readl_relaxed(priv->base + PPE_INTSTS);
+
+	writel_relaxed(DEF_INT_MASK, priv->base + PPE_RINT);
+
+	if (ists & (RCV_INT | RCV_NOBUF)) {
+		/* disable rx interrupt */
+		priv->reg_inten &= ~(RCV_INT | RCV_NOBUF);
+		writel_relaxed(priv->reg_inten, priv->base + PPE_INTEN);
+		napi_schedule(&priv->napi);
+	}
+
+	return IRQ_HANDLED;
+}
+
+static void hip04_adjust_link(struct net_device *ndev)
+{
+	struct hip04_priv *priv = netdev_priv(ndev);
+	struct phy_device *phy = priv->phy;
+
+	if ((priv->speed != phy->speed) || (priv->duplex != phy->duplex)) {
+		hip04_config_port(ndev, phy->speed, phy->duplex);
+		phy_print_status(phy);
+	}
+}
+
+static int hip04_mac_open(struct net_device *ndev)
+{
+	struct hip04_priv *priv = netdev_priv(ndev);
+	int i;
+
+	priv->rx_head = 0;
+	priv->tx_head = 0;
+	priv->tx_tail = 0;
+	priv->tx_count = 0;
+
+	hip04_reset_ppe(priv);
+	for (i = 0; i < RX_DESC_NUM; i++) {
+		dma_addr_t phys;
+
+		phys = dma_map_single(&ndev->dev, priv->rx_buf[i],
+				RX_BUF_SIZE, DMA_FROM_DEVICE);
+		if (dma_mapping_error(&ndev->dev, phys))
+			return -EIO;
+
+		priv->rx_phys[i] = phys;
+		hip04_set_recv_desc(priv, phys);
+	}
+
+	if (priv->phy)
+		phy_start(priv->phy);
+
+	netif_start_queue(ndev);
+	hip04_mac_enable(ndev);
+	napi_enable(&priv->napi);
+
+	return 0;
+}
+
+static int hip04_mac_stop(struct net_device *ndev)
+{
+	struct hip04_priv *priv = netdev_priv(ndev);
+	int i;
+
+	napi_disable(&priv->napi);
+	netif_stop_queue(ndev);
+	hip04_mac_disable(ndev);
+	hip04_tx_reclaim(ndev, true);
+	hip04_reset_ppe(priv);
+
+	if (priv->phy)
+		phy_stop(priv->phy);
+
+	for (i = 0; i < RX_DESC_NUM; i++) {
+		if (priv->rx_phys[i]) {
+			dma_unmap_single(&ndev->dev, priv->rx_phys[i],
+					RX_BUF_SIZE, DMA_FROM_DEVICE);
+			priv->rx_phys[i] = 0;
+		}
+	}
+
+	return 0;
+}
+
+static void hip04_timeout(struct net_device *ndev)
+{
+	hip04_mac_stop(ndev);
+	hip04_mac_open(ndev);
+	return;
+}
+
+static struct net_device_ops hip04_netdev_ops = {
+	.ndo_open		= hip04_mac_open,
+	.ndo_stop		= hip04_mac_stop,
+	.ndo_start_xmit		= hip04_mac_start_xmit,
+	.ndo_set_mac_address	= hip04_set_mac_address,
+	.ndo_tx_timeout         = hip04_timeout,
+	.ndo_validate_addr	= eth_validate_addr,
+	.ndo_change_mtu		= eth_change_mtu,
+};
+
+static int hip04_alloc_ring(struct net_device *ndev, struct device *d)
+{
+	struct hip04_priv *priv = netdev_priv(ndev);
+	int i;
+
+	priv->tx_desc = dma_alloc_coherent(d,
+			TX_DESC_NUM * sizeof(struct tx_desc),
+			&priv->tx_desc_dma, GFP_KERNEL);
+	if (!priv->tx_desc)
+		return -ENOMEM;
+
+	priv->rx_buf_size = RX_BUF_SIZE +
+			    SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
+	for (i = 0; i < RX_DESC_NUM; i++) {
+		priv->rx_buf[i] = netdev_alloc_frag(priv->rx_buf_size);
+		if (!priv->rx_buf[i])
+			return -ENOMEM;
+	}
+
+	return 0;
+}
+
+static void hip04_free_ring(struct net_device *ndev, struct device *d)
+{
+	struct hip04_priv *priv = netdev_priv(ndev);
+	int i;
+
+	for (i = 0; i < RX_DESC_NUM; i++)
+		if (priv->rx_buf[i])
+			put_page(virt_to_head_page(priv->rx_buf[i]));
+
+	for (i = 0; i < TX_DESC_NUM; i++)
+		if (priv->tx_skb[i])
+			dev_kfree_skb_any(priv->tx_skb[i]);
+
+	dma_free_coherent(d, TX_DESC_NUM * sizeof(struct tx_desc),
+			priv->tx_desc, priv->tx_desc_dma);
+}
+
+static int hip04_mac_probe(struct platform_device *pdev)
+{
+	struct device *d = &pdev->dev;
+	struct device_node *node = d->of_node;
+	struct of_phandle_args arg;
+	struct net_device *ndev;
+	struct hip04_priv *priv;
+	struct resource *res;
+	unsigned int irq;
+	int ret;
+
+	ndev = alloc_etherdev(sizeof(struct hip04_priv));
+	if (!ndev)
+		return -ENOMEM;
+
+	priv = netdev_priv(ndev);
+	priv->ndev = ndev;
+	platform_set_drvdata(pdev, ndev);
+
+	res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
+	priv->base = devm_ioremap_resource(d, res);
+	if (IS_ERR(priv->base)) {
+		ret = PTR_ERR(priv->base);
+		goto init_fail;
+	}
+
+	ret = of_parse_phandle_with_fixed_args(node, "port-handle", 2, 0, &arg);
+	if (ret < 0) {
+		dev_warn(d, "no port-handle\n");
+		goto init_fail;
+	}
+
+	priv->port = arg.args[0];
+	priv->chan = arg.args[1];
+
+	priv->map = syscon_node_to_regmap(arg.np);
+	if (IS_ERR(priv->map)) {
+		dev_warn(d, "no syscon hisilicon,hip04-ppe\n");
+		ret = PTR_ERR(priv->map);
+		goto init_fail;
+	}
+
+	priv->phy_mode = of_get_phy_mode(node);
+	if (priv->phy_mode < 0) {
+		dev_warn(d, "not find phy-mode\n");
+		ret = -EINVAL;
+		goto init_fail;
+	}
+
+	irq = platform_get_irq(pdev, 0);
+	if (irq <= 0) {
+		ret = -EINVAL;
+		goto init_fail;
+	}
+
+	ret = devm_request_irq(d, irq, hip04_mac_interrupt,
+				0, pdev->name, ndev);
+	if (ret) {
+		netdev_err(ndev, "devm_request_irq failed\n");
+		goto init_fail;
+	}
+
+	priv->phy_node = of_parse_phandle(node, "phy-handle", 0);
+	if (priv->phy_node) {
+		priv->phy = of_phy_connect(ndev, priv->phy_node,
+			&hip04_adjust_link, 0, priv->phy_mode);
+		if (!priv->phy) {
+			ret = -EPROBE_DEFER;
+			goto init_fail;
+		}
+	}
+
+	ether_setup(ndev);
+	ndev->netdev_ops = &hip04_netdev_ops;
+	ndev->watchdog_timeo = TX_TIMEOUT;
+	ndev->priv_flags |= IFF_UNICAST_FLT;
+	ndev->irq = irq;
+	netif_napi_add(ndev, &priv->napi, hip04_rx_poll, RX_DESC_NUM);
+	SET_NETDEV_DEV(ndev, &pdev->dev);
+
+	hip04_reset_ppe(priv);
+	if (priv->phy_mode == PHY_INTERFACE_MODE_MII)
+		hip04_config_port(ndev, SPEED_100, DUPLEX_FULL);
+
+	hip04_config_fifo(priv);
+	random_ether_addr(ndev->dev_addr);
+	hip04_update_mac_address(ndev);
+
+	ret = hip04_alloc_ring(ndev, d);
+	if (ret) {
+		netdev_err(ndev, "alloc ring fail\n");
+		goto alloc_fail;
+	}
+
+	setup_timer(&priv->txtimer, hip04_xmit_timer, (unsigned long) ndev);
+	ret = register_netdev(ndev);
+	if (ret) {
+		free_netdev(ndev);
+		goto alloc_fail;
+	}
+
+	return 0;
+
+alloc_fail:
+	hip04_free_ring(ndev, d);
+init_fail:
+	of_node_put(priv->phy_node);
+	free_netdev(ndev);
+	return ret;
+}
+
+static int hip04_remove(struct platform_device *pdev)
+{
+	struct net_device *ndev = platform_get_drvdata(pdev);
+	struct hip04_priv *priv = netdev_priv(ndev);
+	struct device *d = &pdev->dev;
+
+	if (priv->phy)
+		phy_disconnect(priv->phy);
+
+	del_timer_sync(&priv->txtimer);
+	hip04_free_ring(ndev, d);
+	unregister_netdev(ndev);
+	free_irq(ndev->irq, ndev);
+	of_node_put(priv->phy_node);
+	free_netdev(ndev);
+
+	return 0;
+}
+
+static const struct of_device_id hip04_mac_match[] = {
+	{ .compatible = "hisilicon,hip04-mac" },
+	{ }
+};
+
+static struct platform_driver hip04_mac_driver = {
+	.probe	= hip04_mac_probe,
+	.remove	= hip04_remove,
+	.driver	= {
+		.name		= DRV_NAME,
+		.owner		= THIS_MODULE,
+		.of_match_table	= hip04_mac_match,
+	},
+};
+module_platform_driver(hip04_mac_driver);
+
+MODULE_DESCRIPTION("HISILICON P04 Ethernet driver");
+MODULE_LICENSE("GPL v2");
+MODULE_ALIAS("platform:hip04-ether");
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 148+ messages in thread

* [PATCH 3/3] net: hisilicon: new hip04 ethernet driver
  2014-03-28 15:35 [PATCH v4 0/3] add hisilicon " Zhangfei Gao
@ 2014-03-28 15:36   ` Zhangfei Gao
  0 siblings, 0 replies; 148+ messages in thread
From: Zhangfei Gao @ 2014-03-28 15:36 UTC (permalink / raw)
  To: davem, linux, arnd, f.fainelli, sergei.shtylyov, mark.rutland,
	David.Laight, eric.dumazet
  Cc: linux-arm-kernel, netdev, devicetree, Zhangfei Gao

Support Hisilicon hip04 ethernet driver, including 100M / 1000M controller

Signed-off-by: Zhangfei Gao <zhangfei.gao@linaro.org>
---
 drivers/net/ethernet/hisilicon/Makefile    |    2 +-
 drivers/net/ethernet/hisilicon/hip04_eth.c |  769 ++++++++++++++++++++++++++++
 2 files changed, 770 insertions(+), 1 deletion(-)
 create mode 100644 drivers/net/ethernet/hisilicon/hip04_eth.c

diff --git a/drivers/net/ethernet/hisilicon/Makefile b/drivers/net/ethernet/hisilicon/Makefile
index 1d6eb6e..5af9b54 100644
--- a/drivers/net/ethernet/hisilicon/Makefile
+++ b/drivers/net/ethernet/hisilicon/Makefile
@@ -2,4 +2,4 @@
 # Makefile for the HISILICON network device drivers.
 #
 
-obj-$(CONFIG_HIP04_ETH) += hip04_mdio.o
+obj-$(CONFIG_HIP04_ETH) +=hip04_mdio.o hip04_eth.o
diff --git a/drivers/net/ethernet/hisilicon/hip04_eth.c b/drivers/net/ethernet/hisilicon/hip04_eth.c
new file mode 100644
index 0000000..efff711
--- /dev/null
+++ b/drivers/net/ethernet/hisilicon/hip04_eth.c
@@ -0,0 +1,769 @@
+
+/* Copyright (c) 2014 Linaro Ltd.
+ * Copyright (c) 2014 Hisilicon Limited.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ */
+
+#include <linux/module.h>
+#include <linux/etherdevice.h>
+#include <linux/platform_device.h>
+#include <linux/interrupt.h>
+#include <linux/of_address.h>
+#include <linux/phy.h>
+#include <linux/of_mdio.h>
+#include <linux/of_net.h>
+#include <linux/mfd/syscon.h>
+#include <linux/regmap.h>
+
+#define PPE_CFG_RX_ADDR			0x100
+#define PPE_CFG_POOL_GRP		0x300
+#define PPE_CFG_RX_BUF_SIZE		0x400
+#define PPE_CFG_RX_FIFO_SIZE		0x500
+#define PPE_CURR_BUF_CNT		0xa200
+
+#define GE_DUPLEX_TYPE			0x8
+#define GE_MAX_FRM_SIZE_REG		0x3c
+#define GE_PORT_MODE			0x40
+#define GE_PORT_EN			0x44
+#define GE_SHORT_RUNTS_THR_REG		0x50
+#define GE_TX_LOCAL_PAGE_REG		0x5c
+#define GE_TRANSMIT_CONTROL_REG		0x60
+#define GE_CF_CRC_STRIP_REG		0x1b0
+#define GE_MODE_CHANGE_EN		0x1b4
+#define GE_RECV_CONTROL_REG		0x1e0
+#define GE_STATION_MAC_ADDRESS		0x210
+#define PPE_CFG_TX_PKT_BD_ADDR		0x420
+#define PPE_CFG_MAX_FRAME_LEN_REG	0x408
+#define PPE_CFG_BUS_CTRL_REG		0x424
+#define PPE_CFG_RX_CTRL_REG		0x428
+#define PPE_CFG_RX_PKT_MODE_REG		0x438
+#define PPE_CFG_QOS_VMID_GEN		0x500
+#define PPE_CFG_RX_PKT_INT		0x538
+#define PPE_INTEN			0x600
+#define PPE_INTSTS			0x608
+#define PPE_RINT			0x604
+#define PPE_CFG_STS_MODE		0x700
+#define PPE_HIS_RX_PKT_CNT		0x804
+
+/* REG_INTERRUPT */
+#define RCV_INT				BIT(10)
+#define RCV_NOBUF			BIT(8)
+#define DEF_INT_MASK			(RCV_INT | RCV_NOBUF)
+
+#define RX_DESC_NUM			64
+#define TX_DESC_NUM			64
+#define TX_NEXT(N)			(((N) + 1) & (TX_DESC_NUM-1))
+#define RX_NEXT(N)			(((N) + 1) & (RX_DESC_NUM-1))
+
+#define GMAC_PPE_RX_PKT_MAX_LEN		379
+#define GMAC_MAX_PKT_LEN		1516
+#define DESC_DEF_CFG			0x14
+#define RX_BUF_SIZE			1600
+#define RX_PKT_ERR			0x3
+#define TX_TIMEOUT			(6 * HZ)
+#define RECLAIM_PERIOD			HZ
+
+#define DRV_NAME			"hip04-ether"
+
+struct tx_desc {
+	u32 send_addr;
+	u16 reserved_16;
+	u16 send_size;
+	u32 reserved_32;
+	u32 cfg;
+	u32 wb_addr;
+} ____cacheline_aligned;
+
+struct rx_desc {
+	u16 reserved_16;
+	u16 pkt_len;
+	u32 reserve1[3];
+	u32 pkt_err;
+	u32 reserve2[4];
+};
+
+struct hip04_priv {
+	void __iomem *base;
+	int phy_mode;
+	int chan;
+	unsigned int port;
+	unsigned int speed;
+	unsigned int duplex;
+	unsigned int reg_inten;
+
+	struct napi_struct napi;
+	struct net_device *ndev;
+
+	struct tx_desc *tx_desc;
+	dma_addr_t tx_desc_dma;
+	struct sk_buff *tx_skb[TX_DESC_NUM];
+	dma_addr_t tx_phys[TX_DESC_NUM];
+	spinlock_t lock;
+	unsigned int tx_head;
+	unsigned int tx_tail;
+	unsigned int tx_count;
+
+	unsigned char *rx_buf[RX_DESC_NUM];
+	dma_addr_t rx_phys[RX_DESC_NUM];
+	unsigned int rx_head;
+	unsigned int rx_buf_size;
+
+	struct device_node *phy_node;
+	struct phy_device *phy;
+	struct regmap *map;
+	struct timer_list txtimer;
+};
+
+static void hip04_config_port(struct net_device *ndev, u32 speed, u32 duplex)
+{
+	struct hip04_priv *priv = netdev_priv(ndev);
+	u32 val;
+
+	priv->speed = speed;
+	priv->duplex = duplex;
+
+	switch (priv->phy_mode) {
+	case PHY_INTERFACE_MODE_SGMII:
+		if (speed == SPEED_1000)
+			val = 8;
+		else if (speed == SPEED_100)
+			val = 7;
+		else
+			val = 6;	/* SPEED_10 */
+		break;
+	case PHY_INTERFACE_MODE_MII:
+		if (speed == SPEED_100)
+			val = 1;
+		else
+			val = 0;	/* SPEED_10 */
+		break;
+	default:
+		netdev_warn(ndev, "not supported mode\n");
+		val = 0;
+		break;
+	}
+	writel_relaxed(val, priv->base + GE_PORT_MODE);
+
+	val = (duplex) ? BIT(0) : 0;
+	writel_relaxed(val, priv->base + GE_DUPLEX_TYPE);
+
+	val = BIT(0);
+	writel_relaxed(val, priv->base + GE_MODE_CHANGE_EN);
+}
+
+static void hip04_reset_ppe(struct hip04_priv *priv)
+{
+	u32 val, tmp;
+
+	do {
+		regmap_read(priv->map, priv->port * 4 + PPE_CURR_BUF_CNT, &val);
+		regmap_read(priv->map, priv->port * 4 + PPE_CFG_RX_ADDR, &tmp);
+	} while (val & 0xfff);
+}
+
+static void hip04_config_fifo(struct hip04_priv *priv)
+{
+	u32 val;
+
+	val = readl_relaxed(priv->base + PPE_CFG_STS_MODE);
+	val |= BIT(12);			/* PPE_HIS_RX_PKT_CNT read clear */
+	writel_relaxed(val, priv->base + PPE_CFG_STS_MODE);
+
+	val = BIT(priv->port);
+	regmap_write(priv->map, priv->port * 4 + PPE_CFG_POOL_GRP, val);
+
+	val = priv->port << 8;
+	val |= BIT(14);
+	writel_relaxed(val, priv->base + PPE_CFG_QOS_VMID_GEN);
+
+	val = RX_BUF_SIZE;
+	regmap_write(priv->map, priv->port * 4 + PPE_CFG_RX_BUF_SIZE, val);
+
+	val = RX_DESC_NUM << 16;	/* depth */
+	val |= BIT(11);			/* seq: first set first ues */
+	val |= RX_DESC_NUM * priv->chan;	/* start_addr */
+	regmap_write(priv->map, priv->port * 4 + PPE_CFG_RX_FIFO_SIZE, val);
+
+	/* pkt store format */
+	val = NET_IP_ALIGN << 11;	/* align */
+	writel_relaxed(val, priv->base + PPE_CFG_RX_CTRL_REG);
+
+	/* following cfg required for 1000M */
+	/* pkt mode */
+	val = BIT(18);			/* align */
+	writel_relaxed(val, priv->base + PPE_CFG_RX_PKT_MODE_REG);
+
+	/* set bus ctrl */
+	val = BIT(14);			/* buffer locally release */
+	val |= BIT(0);			/* big endian */
+	writel_relaxed(val, priv->base + PPE_CFG_BUS_CTRL_REG);
+
+	/* set max pkt len, curtail if exceed */
+	val = GMAC_PPE_RX_PKT_MAX_LEN;	/* max buffer len */
+	writel_relaxed(val, priv->base + PPE_CFG_MAX_FRAME_LEN_REG);
+
+	/* set max len of each pkt */
+	val = GMAC_MAX_PKT_LEN;		/* max buffer len */
+	writel_relaxed(val, priv->base + GE_MAX_FRM_SIZE_REG);
+
+	/* set min len of each pkt */
+	val = 31;			/* min buffer len */
+	writel_relaxed(val, priv->base + GE_SHORT_RUNTS_THR_REG);
+
+	/* tx */
+	val = readl_relaxed(priv->base + GE_TRANSMIT_CONTROL_REG);
+	val |= BIT(5);			/* tx auto neg */
+	val |= BIT(6);			/* tx add crc */
+	val |= BIT(7);			/* tx short pad through */
+	writel_relaxed(val, priv->base + GE_TRANSMIT_CONTROL_REG);
+
+	/* rx crc */
+	val = BIT(0);			/* rx strip crc */
+	writel_relaxed(val, priv->base + GE_CF_CRC_STRIP_REG);
+
+	/* rx */
+	val = readl_relaxed(priv->base + GE_RECV_CONTROL_REG);
+	val |= BIT(3);			/* rx strip pad */
+	val |= BIT(4);			/* run pkt en */
+	writel_relaxed(val, priv->base + GE_RECV_CONTROL_REG);
+
+	/* auto neg control */
+	val = BIT(0);
+	writel_relaxed(val, priv->base + GE_TX_LOCAL_PAGE_REG);
+}
+
+static void hip04_mac_enable(struct net_device *ndev)
+{
+	struct hip04_priv *priv = netdev_priv(ndev);
+	u32 val;
+
+	/* enable tx & rx */
+	val = readl_relaxed(priv->base + GE_PORT_EN);
+	val |= BIT(1);		/* rx*/
+	val |= BIT(2);		/* tx*/
+	writel_relaxed(val, priv->base + GE_PORT_EN);
+
+	/* clear rx int */
+	val = RCV_INT;
+	writel_relaxed(val, priv->base + PPE_RINT);
+
+	/* config recv int*/
+	val = BIT(6);		/* int threshold 1 package */
+	val |= 0x4;		/* recv timeout */
+	writel_relaxed(val, priv->base + PPE_CFG_RX_PKT_INT);
+
+	/* enable interrupt */
+	priv->reg_inten = DEF_INT_MASK;
+	writel_relaxed(priv->reg_inten, priv->base + PPE_INTEN);
+}
+
+static void hip04_mac_disable(struct net_device *ndev)
+{
+	struct hip04_priv *priv = netdev_priv(ndev);
+	u32 val;
+
+	/* disable int */
+	priv->reg_inten &= ~(RCV_INT | RCV_NOBUF);
+	writel_relaxed(priv->reg_inten, priv->base + PPE_INTEN);
+
+	/* disable tx & rx */
+	val = readl_relaxed(priv->base + GE_PORT_EN);
+	val &= ~(BIT(1));	/* rx*/
+	val &= ~(BIT(2));	/* tx*/
+	writel_relaxed(val, priv->base + GE_PORT_EN);
+}
+
+static void hip04_set_xmit_desc(struct hip04_priv *priv, dma_addr_t phys)
+{
+	writel(phys, priv->base + PPE_CFG_TX_PKT_BD_ADDR);
+}
+
+static void hip04_set_recv_desc(struct hip04_priv *priv, dma_addr_t phys)
+{
+	regmap_write(priv->map, priv->port * 4 + PPE_CFG_RX_ADDR, phys);
+}
+
+static u32 hip04_recv_cnt(struct hip04_priv *priv)
+{
+	return readl(priv->base + PPE_HIS_RX_PKT_CNT);
+}
+
+static void hip04_update_mac_address(struct net_device *ndev)
+{
+	struct hip04_priv *priv = netdev_priv(ndev);
+
+	writel_relaxed(((ndev->dev_addr[0] << 8) | (ndev->dev_addr[1])),
+			priv->base + GE_STATION_MAC_ADDRESS);
+	writel_relaxed(((ndev->dev_addr[2] << 24) | (ndev->dev_addr[3] << 16) |
+			(ndev->dev_addr[4] << 8) | (ndev->dev_addr[5])),
+			priv->base + GE_STATION_MAC_ADDRESS + 4);
+}
+
+static int hip04_set_mac_address(struct net_device *ndev, void *addr)
+{
+	eth_mac_addr(ndev, addr);
+	hip04_update_mac_address(ndev);
+	return 0;
+}
+
+static void hip04_tx_reclaim(struct net_device *ndev, bool force)
+{
+	struct hip04_priv *priv = netdev_priv(ndev);
+	unsigned tx_head = priv->tx_head;
+	unsigned tx_tail = priv->tx_tail;
+	struct tx_desc *desc;
+
+	spin_lock_bh(&priv->lock);
+	while ((tx_tail != tx_head) || (priv->tx_count == TX_DESC_NUM)) {
+		if (priv->tx_phys[tx_tail]) {
+			dma_unmap_single(&ndev->dev, priv->tx_phys[tx_tail],
+				priv->tx_skb[tx_tail]->len, DMA_TO_DEVICE);
+			priv->tx_phys[tx_tail] = 0;
+		}
+		desc = &priv->tx_desc[priv->tx_tail];
+		if (desc->send_addr != 0) {
+			if (force)
+				desc->send_addr = 0;
+			else
+				break;
+		}
+		dev_kfree_skb(priv->tx_skb[tx_tail]);
+		priv->tx_skb[tx_tail] = NULL;
+		tx_tail = TX_NEXT(tx_tail);
+		priv->tx_count--;
+	}
+	priv->tx_tail = tx_tail;
+	spin_unlock_bh(&priv->lock);
+
+	if (priv->tx_count)
+		mod_timer(&priv->txtimer, jiffies + RECLAIM_PERIOD);
+
+	if (unlikely(netif_queue_stopped(ndev)) &&
+		(priv->tx_count < TX_DESC_NUM))
+		netif_wake_queue(ndev);
+}
+
+static void hip04_xmit_timer(unsigned long data)
+{
+	struct net_device *ndev = (void *)data;
+
+	hip04_tx_reclaim(ndev, false);
+}
+
+static int hip04_mac_start_xmit(struct sk_buff *skb, struct net_device *ndev)
+{
+	struct hip04_priv *priv = netdev_priv(ndev);
+	struct net_device_stats *stats = &ndev->stats;
+	unsigned int tx_head = priv->tx_head;
+	struct tx_desc *desc = &priv->tx_desc[tx_head];
+	dma_addr_t phys;
+
+	hip04_tx_reclaim(ndev, false);
+	mod_timer(&priv->txtimer, jiffies + RECLAIM_PERIOD);
+
+	if (priv->tx_count >= TX_DESC_NUM) {
+		netif_stop_queue(ndev);
+		return NETDEV_TX_BUSY;
+	}
+
+	phys = dma_map_single(&ndev->dev, skb->data, skb->len, DMA_TO_DEVICE);
+	if (dma_mapping_error(&ndev->dev, phys)) {
+		dev_kfree_skb(skb);
+		return NETDEV_TX_OK;
+	}
+
+	priv->tx_skb[tx_head] = skb;
+	priv->tx_phys[tx_head] = phys;
+	desc->send_addr = cpu_to_be32(phys);
+	desc->send_size = cpu_to_be16(skb->len);
+	desc->cfg = cpu_to_be32(DESC_DEF_CFG);
+	phys = priv->tx_desc_dma + tx_head * sizeof(struct tx_desc);
+	desc->wb_addr = cpu_to_be32(phys);
+	skb_tx_timestamp(skb);
+
+	/* Don't wait up for transmitted skbs to be freed. */
+	skb_orphan(skb);
+
+	hip04_set_xmit_desc(priv, phys);
+	priv->tx_head = TX_NEXT(tx_head);
+
+	stats->tx_bytes += skb->len;
+	stats->tx_packets++;
+	priv->tx_count++;
+
+	return NETDEV_TX_OK;
+}
+
+static int hip04_rx_poll(struct napi_struct *napi, int budget)
+{
+	struct hip04_priv *priv = container_of(napi, struct hip04_priv, napi);
+	struct net_device *ndev = priv->ndev;
+	struct net_device_stats *stats = &ndev->stats;
+	unsigned int cnt = hip04_recv_cnt(priv);
+	struct rx_desc *desc;
+	struct sk_buff *skb;
+	unsigned char *buf;
+	bool last = false;
+	dma_addr_t phys;
+	int rx = 0;
+	u16 len;
+	u32 err;
+
+	while (cnt && !last) {
+		buf = priv->rx_buf[priv->rx_head];
+		skb = build_skb(buf, priv->rx_buf_size);
+		if (unlikely(!skb))
+			net_dbg_ratelimited("build_skb failed\n");
+
+		dma_unmap_single(&ndev->dev, priv->rx_phys[priv->rx_head],
+				RX_BUF_SIZE, DMA_FROM_DEVICE);
+		priv->rx_phys[priv->rx_head] = 0;
+
+		desc = (struct rx_desc *)skb->data;
+		len = be16_to_cpu(desc->pkt_len);
+		err = be32_to_cpu(desc->pkt_err);
+
+		if (len > RX_BUF_SIZE)
+			len = RX_BUF_SIZE;
+
+		if (0 == len) {
+			dev_kfree_skb_any(skb);
+			last = true;
+		} else if (err & RX_PKT_ERR) {
+			dev_kfree_skb_any(skb);
+			stats->rx_dropped++;
+			stats->rx_errors++;
+		} else {
+			skb_reserve(skb, NET_SKB_PAD + NET_IP_ALIGN);
+			skb_put(skb, len);
+			skb->protocol = eth_type_trans(skb, ndev);
+			napi_gro_receive(&priv->napi, skb);
+			stats->rx_packets++;
+			stats->rx_bytes += len;
+			rx++;
+		}
+
+		buf = netdev_alloc_frag(priv->rx_buf_size);
+		if (!buf)
+			return -ENOMEM;
+		phys = dma_map_single(&ndev->dev, buf,
+				RX_BUF_SIZE, DMA_FROM_DEVICE);
+		if (dma_mapping_error(&ndev->dev, phys))
+			return -EIO;
+		priv->rx_buf[priv->rx_head] = buf;
+		priv->rx_phys[priv->rx_head] = phys;
+		hip04_set_recv_desc(priv, phys);
+
+		priv->rx_head = RX_NEXT(priv->rx_head);
+		if (rx >= budget)
+			break;
+
+		if (--cnt == 0)
+			cnt = hip04_recv_cnt(priv);
+	}
+
+	if (rx < budget) {
+		napi_complete(napi);
+
+		/* enable rx interrupt */
+		priv->reg_inten |= RCV_INT | RCV_NOBUF;
+		writel_relaxed(priv->reg_inten, priv->base + PPE_INTEN);
+	}
+
+	return rx;
+}
+
+static irqreturn_t hip04_mac_interrupt(int irq, void *dev_id)
+{
+	struct net_device *ndev = (struct net_device *) dev_id;
+	struct hip04_priv *priv = netdev_priv(ndev);
+	u32 ists = readl_relaxed(priv->base + PPE_INTSTS);
+
+	writel_relaxed(DEF_INT_MASK, priv->base + PPE_RINT);
+
+	if (ists & (RCV_INT | RCV_NOBUF)) {
+		/* disable rx interrupt */
+		priv->reg_inten &= ~(RCV_INT | RCV_NOBUF);
+		writel_relaxed(priv->reg_inten, priv->base + PPE_INTEN);
+		napi_schedule(&priv->napi);
+	}
+
+	return IRQ_HANDLED;
+}
+
+static void hip04_adjust_link(struct net_device *ndev)
+{
+	struct hip04_priv *priv = netdev_priv(ndev);
+	struct phy_device *phy = priv->phy;
+
+	if ((priv->speed != phy->speed) || (priv->duplex != phy->duplex)) {
+		hip04_config_port(ndev, phy->speed, phy->duplex);
+		phy_print_status(phy);
+	}
+}
+
+static int hip04_mac_open(struct net_device *ndev)
+{
+	struct hip04_priv *priv = netdev_priv(ndev);
+	int i;
+
+	priv->rx_head = 0;
+	priv->tx_head = 0;
+	priv->tx_tail = 0;
+	priv->tx_count = 0;
+
+	hip04_reset_ppe(priv);
+	for (i = 0; i < RX_DESC_NUM; i++) {
+		dma_addr_t phys;
+
+		phys = dma_map_single(&ndev->dev, priv->rx_buf[i],
+				RX_BUF_SIZE, DMA_FROM_DEVICE);
+		if (dma_mapping_error(&ndev->dev, phys))
+			return -EIO;
+
+		priv->rx_phys[i] = phys;
+		hip04_set_recv_desc(priv, phys);
+	}
+
+	if (priv->phy)
+		phy_start(priv->phy);
+
+	netif_start_queue(ndev);
+	hip04_mac_enable(ndev);
+	napi_enable(&priv->napi);
+
+	return 0;
+}
+
+static int hip04_mac_stop(struct net_device *ndev)
+{
+	struct hip04_priv *priv = netdev_priv(ndev);
+	int i;
+
+	napi_disable(&priv->napi);
+	netif_stop_queue(ndev);
+	hip04_mac_disable(ndev);
+	hip04_tx_reclaim(ndev, true);
+	hip04_reset_ppe(priv);
+
+	if (priv->phy)
+		phy_stop(priv->phy);
+
+	for (i = 0; i < RX_DESC_NUM; i++) {
+		if (priv->rx_phys[i]) {
+			dma_unmap_single(&ndev->dev, priv->rx_phys[i],
+					RX_BUF_SIZE, DMA_FROM_DEVICE);
+			priv->rx_phys[i] = 0;
+		}
+	}
+
+	return 0;
+}
+
+static void hip04_timeout(struct net_device *ndev)
+{
+	hip04_mac_stop(ndev);
+	hip04_mac_open(ndev);
+	return;
+}
+
+static struct net_device_ops hip04_netdev_ops = {
+	.ndo_open		= hip04_mac_open,
+	.ndo_stop		= hip04_mac_stop,
+	.ndo_start_xmit		= hip04_mac_start_xmit,
+	.ndo_set_mac_address	= hip04_set_mac_address,
+	.ndo_tx_timeout         = hip04_timeout,
+	.ndo_validate_addr	= eth_validate_addr,
+	.ndo_change_mtu		= eth_change_mtu,
+};
+
+static int hip04_alloc_ring(struct net_device *ndev, struct device *d)
+{
+	struct hip04_priv *priv = netdev_priv(ndev);
+	int i;
+
+	priv->tx_desc = dma_alloc_coherent(d,
+			TX_DESC_NUM * sizeof(struct tx_desc),
+			&priv->tx_desc_dma, GFP_KERNEL);
+	if (!priv->tx_desc)
+		return -ENOMEM;
+
+	priv->rx_buf_size = RX_BUF_SIZE +
+			    SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
+	for (i = 0; i < RX_DESC_NUM; i++) {
+		priv->rx_buf[i] = netdev_alloc_frag(priv->rx_buf_size);
+		if (!priv->rx_buf[i])
+			return -ENOMEM;
+	}
+
+	return 0;
+}
+
+static void hip04_free_ring(struct net_device *ndev, struct device *d)
+{
+	struct hip04_priv *priv = netdev_priv(ndev);
+	int i;
+
+	for (i = 0; i < RX_DESC_NUM; i++)
+		if (priv->rx_buf[i])
+			put_page(virt_to_head_page(priv->rx_buf[i]));
+
+	for (i = 0; i < TX_DESC_NUM; i++)
+		if (priv->tx_skb[i])
+			dev_kfree_skb_any(priv->tx_skb[i]);
+
+	dma_free_coherent(d, TX_DESC_NUM * sizeof(struct tx_desc),
+			priv->tx_desc, priv->tx_desc_dma);
+}
+
+static int hip04_mac_probe(struct platform_device *pdev)
+{
+	struct device *d = &pdev->dev;
+	struct device_node *node = d->of_node;
+	struct of_phandle_args arg;
+	struct net_device *ndev;
+	struct hip04_priv *priv;
+	struct resource *res;
+	unsigned int irq;
+	int ret;
+
+	ndev = alloc_etherdev(sizeof(struct hip04_priv));
+	if (!ndev)
+		return -ENOMEM;
+
+	priv = netdev_priv(ndev);
+	priv->ndev = ndev;
+	platform_set_drvdata(pdev, ndev);
+
+	res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
+	priv->base = devm_ioremap_resource(d, res);
+	if (IS_ERR(priv->base)) {
+		ret = PTR_ERR(priv->base);
+		goto init_fail;
+	}
+
+	ret = of_parse_phandle_with_fixed_args(node, "port-handle", 2, 0, &arg);
+	if (ret < 0) {
+		dev_warn(d, "no port-handle\n");
+		goto init_fail;
+	}
+
+	priv->port = arg.args[0];
+	priv->chan = arg.args[1];
+
+	priv->map = syscon_node_to_regmap(arg.np);
+	if (IS_ERR(priv->map)) {
+		dev_warn(d, "no syscon hisilicon,hip04-ppe\n");
+		ret = PTR_ERR(priv->map);
+		goto init_fail;
+	}
+
+	priv->phy_mode = of_get_phy_mode(node);
+	if (priv->phy_mode < 0) {
+		dev_warn(d, "not find phy-mode\n");
+		ret = -EINVAL;
+		goto init_fail;
+	}
+
+	irq = platform_get_irq(pdev, 0);
+	if (irq <= 0) {
+		ret = -EINVAL;
+		goto init_fail;
+	}
+
+	ether_setup(ndev);
+	ndev->netdev_ops = &hip04_netdev_ops;
+	ndev->watchdog_timeo = TX_TIMEOUT;
+	ndev->priv_flags |= IFF_UNICAST_FLT;
+	ndev->irq = irq;
+	netif_napi_add(ndev, &priv->napi, hip04_rx_poll, RX_DESC_NUM);
+	SET_NETDEV_DEV(ndev, &pdev->dev);
+
+	hip04_reset_ppe(priv);
+	if (priv->phy_mode == PHY_INTERFACE_MODE_MII)
+		hip04_config_port(ndev, SPEED_100, DUPLEX_FULL);
+
+	hip04_config_fifo(priv);
+	random_ether_addr(ndev->dev_addr);
+	hip04_update_mac_address(ndev);
+
+	ret = hip04_alloc_ring(ndev, d);
+	if (ret) {
+		netdev_err(ndev, "alloc ring fail\n");
+		goto alloc_fail;
+	}
+
+	ret = devm_request_irq(d, irq, hip04_mac_interrupt,
+				0, pdev->name, ndev);
+	if (ret) {
+		netdev_err(ndev, "devm_request_irq failed\n");
+		goto alloc_fail;
+	}
+
+	priv->phy_node = of_parse_phandle(node, "phy-handle", 0);
+	if (priv->phy_node) {
+		priv->phy = of_phy_connect(ndev, priv->phy_node,
+			&hip04_adjust_link, 0, priv->phy_mode);
+		if (!priv->phy)
+			return -EPROBE_DEFER;
+	}
+
+	setup_timer(&priv->txtimer, hip04_xmit_timer, (unsigned long) ndev);
+	ret = register_netdev(ndev);
+	if (ret) {
+		free_netdev(ndev);
+		goto alloc_fail;
+	}
+
+	return 0;
+
+alloc_fail:
+	hip04_free_ring(ndev, d);
+init_fail:
+	of_node_put(priv->phy_node);
+	free_netdev(ndev);
+	return ret;
+}
+
+static int hip04_remove(struct platform_device *pdev)
+{
+	struct net_device *ndev = platform_get_drvdata(pdev);
+	struct hip04_priv *priv = netdev_priv(ndev);
+	struct device *d = &pdev->dev;
+
+	if (priv->phy)
+		phy_disconnect(priv->phy);
+
+	del_timer_sync(&priv->txtimer);
+	hip04_free_ring(ndev, d);
+	unregister_netdev(ndev);
+	free_irq(ndev->irq, ndev);
+	of_node_put(priv->phy_node);
+	free_netdev(ndev);
+
+	return 0;
+}
+
+static const struct of_device_id hip04_mac_match[] = {
+	{ .compatible = "hisilicon,hip04-mac" },
+	{ }
+};
+
+static struct platform_driver hip04_mac_driver = {
+	.probe	= hip04_mac_probe,
+	.remove	= hip04_remove,
+	.driver	= {
+		.name		= DRV_NAME,
+		.owner		= THIS_MODULE,
+		.of_match_table	= hip04_mac_match,
+	},
+};
+module_platform_driver(hip04_mac_driver);
+
+MODULE_DESCRIPTION("HISILICON P04 Ethernet driver");
+MODULE_LICENSE("GPL v2");
+MODULE_ALIAS("platform:hip04-ether");
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 148+ messages in thread

* [PATCH 3/3] net: hisilicon: new hip04 ethernet driver
@ 2014-03-28 15:36   ` Zhangfei Gao
  0 siblings, 0 replies; 148+ messages in thread
From: Zhangfei Gao @ 2014-03-28 15:36 UTC (permalink / raw)
  To: linux-arm-kernel

Support Hisilicon hip04 ethernet driver, including 100M / 1000M controller

Signed-off-by: Zhangfei Gao <zhangfei.gao@linaro.org>
---
 drivers/net/ethernet/hisilicon/Makefile    |    2 +-
 drivers/net/ethernet/hisilicon/hip04_eth.c |  769 ++++++++++++++++++++++++++++
 2 files changed, 770 insertions(+), 1 deletion(-)
 create mode 100644 drivers/net/ethernet/hisilicon/hip04_eth.c

diff --git a/drivers/net/ethernet/hisilicon/Makefile b/drivers/net/ethernet/hisilicon/Makefile
index 1d6eb6e..5af9b54 100644
--- a/drivers/net/ethernet/hisilicon/Makefile
+++ b/drivers/net/ethernet/hisilicon/Makefile
@@ -2,4 +2,4 @@
 # Makefile for the HISILICON network device drivers.
 #
 
-obj-$(CONFIG_HIP04_ETH) += hip04_mdio.o
+obj-$(CONFIG_HIP04_ETH) +=hip04_mdio.o hip04_eth.o
diff --git a/drivers/net/ethernet/hisilicon/hip04_eth.c b/drivers/net/ethernet/hisilicon/hip04_eth.c
new file mode 100644
index 0000000..efff711
--- /dev/null
+++ b/drivers/net/ethernet/hisilicon/hip04_eth.c
@@ -0,0 +1,769 @@
+
+/* Copyright (c) 2014 Linaro Ltd.
+ * Copyright (c) 2014 Hisilicon Limited.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ */
+
+#include <linux/module.h>
+#include <linux/etherdevice.h>
+#include <linux/platform_device.h>
+#include <linux/interrupt.h>
+#include <linux/of_address.h>
+#include <linux/phy.h>
+#include <linux/of_mdio.h>
+#include <linux/of_net.h>
+#include <linux/mfd/syscon.h>
+#include <linux/regmap.h>
+
+#define PPE_CFG_RX_ADDR			0x100
+#define PPE_CFG_POOL_GRP		0x300
+#define PPE_CFG_RX_BUF_SIZE		0x400
+#define PPE_CFG_RX_FIFO_SIZE		0x500
+#define PPE_CURR_BUF_CNT		0xa200
+
+#define GE_DUPLEX_TYPE			0x8
+#define GE_MAX_FRM_SIZE_REG		0x3c
+#define GE_PORT_MODE			0x40
+#define GE_PORT_EN			0x44
+#define GE_SHORT_RUNTS_THR_REG		0x50
+#define GE_TX_LOCAL_PAGE_REG		0x5c
+#define GE_TRANSMIT_CONTROL_REG		0x60
+#define GE_CF_CRC_STRIP_REG		0x1b0
+#define GE_MODE_CHANGE_EN		0x1b4
+#define GE_RECV_CONTROL_REG		0x1e0
+#define GE_STATION_MAC_ADDRESS		0x210
+#define PPE_CFG_TX_PKT_BD_ADDR		0x420
+#define PPE_CFG_MAX_FRAME_LEN_REG	0x408
+#define PPE_CFG_BUS_CTRL_REG		0x424
+#define PPE_CFG_RX_CTRL_REG		0x428
+#define PPE_CFG_RX_PKT_MODE_REG		0x438
+#define PPE_CFG_QOS_VMID_GEN		0x500
+#define PPE_CFG_RX_PKT_INT		0x538
+#define PPE_INTEN			0x600
+#define PPE_INTSTS			0x608
+#define PPE_RINT			0x604
+#define PPE_CFG_STS_MODE		0x700
+#define PPE_HIS_RX_PKT_CNT		0x804
+
+/* REG_INTERRUPT */
+#define RCV_INT				BIT(10)
+#define RCV_NOBUF			BIT(8)
+#define DEF_INT_MASK			(RCV_INT | RCV_NOBUF)
+
+#define RX_DESC_NUM			64
+#define TX_DESC_NUM			64
+#define TX_NEXT(N)			(((N) + 1) & (TX_DESC_NUM-1))
+#define RX_NEXT(N)			(((N) + 1) & (RX_DESC_NUM-1))
+
+#define GMAC_PPE_RX_PKT_MAX_LEN		379
+#define GMAC_MAX_PKT_LEN		1516
+#define DESC_DEF_CFG			0x14
+#define RX_BUF_SIZE			1600
+#define RX_PKT_ERR			0x3
+#define TX_TIMEOUT			(6 * HZ)
+#define RECLAIM_PERIOD			HZ
+
+#define DRV_NAME			"hip04-ether"
+
+struct tx_desc {
+	u32 send_addr;
+	u16 reserved_16;
+	u16 send_size;
+	u32 reserved_32;
+	u32 cfg;
+	u32 wb_addr;
+} ____cacheline_aligned;
+
+struct rx_desc {
+	u16 reserved_16;
+	u16 pkt_len;
+	u32 reserve1[3];
+	u32 pkt_err;
+	u32 reserve2[4];
+};
+
+struct hip04_priv {
+	void __iomem *base;
+	int phy_mode;
+	int chan;
+	unsigned int port;
+	unsigned int speed;
+	unsigned int duplex;
+	unsigned int reg_inten;
+
+	struct napi_struct napi;
+	struct net_device *ndev;
+
+	struct tx_desc *tx_desc;
+	dma_addr_t tx_desc_dma;
+	struct sk_buff *tx_skb[TX_DESC_NUM];
+	dma_addr_t tx_phys[TX_DESC_NUM];
+	spinlock_t lock;
+	unsigned int tx_head;
+	unsigned int tx_tail;
+	unsigned int tx_count;
+
+	unsigned char *rx_buf[RX_DESC_NUM];
+	dma_addr_t rx_phys[RX_DESC_NUM];
+	unsigned int rx_head;
+	unsigned int rx_buf_size;
+
+	struct device_node *phy_node;
+	struct phy_device *phy;
+	struct regmap *map;
+	struct timer_list txtimer;
+};
+
+static void hip04_config_port(struct net_device *ndev, u32 speed, u32 duplex)
+{
+	struct hip04_priv *priv = netdev_priv(ndev);
+	u32 val;
+
+	priv->speed = speed;
+	priv->duplex = duplex;
+
+	switch (priv->phy_mode) {
+	case PHY_INTERFACE_MODE_SGMII:
+		if (speed == SPEED_1000)
+			val = 8;
+		else if (speed == SPEED_100)
+			val = 7;
+		else
+			val = 6;	/* SPEED_10 */
+		break;
+	case PHY_INTERFACE_MODE_MII:
+		if (speed == SPEED_100)
+			val = 1;
+		else
+			val = 0;	/* SPEED_10 */
+		break;
+	default:
+		netdev_warn(ndev, "not supported mode\n");
+		val = 0;
+		break;
+	}
+	writel_relaxed(val, priv->base + GE_PORT_MODE);
+
+	val = (duplex) ? BIT(0) : 0;
+	writel_relaxed(val, priv->base + GE_DUPLEX_TYPE);
+
+	val = BIT(0);
+	writel_relaxed(val, priv->base + GE_MODE_CHANGE_EN);
+}
+
+static void hip04_reset_ppe(struct hip04_priv *priv)
+{
+	u32 val, tmp;
+
+	do {
+		regmap_read(priv->map, priv->port * 4 + PPE_CURR_BUF_CNT, &val);
+		regmap_read(priv->map, priv->port * 4 + PPE_CFG_RX_ADDR, &tmp);
+	} while (val & 0xfff);
+}
+
+static void hip04_config_fifo(struct hip04_priv *priv)
+{
+	u32 val;
+
+	val = readl_relaxed(priv->base + PPE_CFG_STS_MODE);
+	val |= BIT(12);			/* PPE_HIS_RX_PKT_CNT read clear */
+	writel_relaxed(val, priv->base + PPE_CFG_STS_MODE);
+
+	val = BIT(priv->port);
+	regmap_write(priv->map, priv->port * 4 + PPE_CFG_POOL_GRP, val);
+
+	val = priv->port << 8;
+	val |= BIT(14);
+	writel_relaxed(val, priv->base + PPE_CFG_QOS_VMID_GEN);
+
+	val = RX_BUF_SIZE;
+	regmap_write(priv->map, priv->port * 4 + PPE_CFG_RX_BUF_SIZE, val);
+
+	val = RX_DESC_NUM << 16;	/* depth */
+	val |= BIT(11);			/* seq: first set first ues */
+	val |= RX_DESC_NUM * priv->chan;	/* start_addr */
+	regmap_write(priv->map, priv->port * 4 + PPE_CFG_RX_FIFO_SIZE, val);
+
+	/* pkt store format */
+	val = NET_IP_ALIGN << 11;	/* align */
+	writel_relaxed(val, priv->base + PPE_CFG_RX_CTRL_REG);
+
+	/* following cfg required for 1000M */
+	/* pkt mode */
+	val = BIT(18);			/* align */
+	writel_relaxed(val, priv->base + PPE_CFG_RX_PKT_MODE_REG);
+
+	/* set bus ctrl */
+	val = BIT(14);			/* buffer locally release */
+	val |= BIT(0);			/* big endian */
+	writel_relaxed(val, priv->base + PPE_CFG_BUS_CTRL_REG);
+
+	/* set max pkt len, curtail if exceed */
+	val = GMAC_PPE_RX_PKT_MAX_LEN;	/* max buffer len */
+	writel_relaxed(val, priv->base + PPE_CFG_MAX_FRAME_LEN_REG);
+
+	/* set max len of each pkt */
+	val = GMAC_MAX_PKT_LEN;		/* max buffer len */
+	writel_relaxed(val, priv->base + GE_MAX_FRM_SIZE_REG);
+
+	/* set min len of each pkt */
+	val = 31;			/* min buffer len */
+	writel_relaxed(val, priv->base + GE_SHORT_RUNTS_THR_REG);
+
+	/* tx */
+	val = readl_relaxed(priv->base + GE_TRANSMIT_CONTROL_REG);
+	val |= BIT(5);			/* tx auto neg */
+	val |= BIT(6);			/* tx add crc */
+	val |= BIT(7);			/* tx short pad through */
+	writel_relaxed(val, priv->base + GE_TRANSMIT_CONTROL_REG);
+
+	/* rx crc */
+	val = BIT(0);			/* rx strip crc */
+	writel_relaxed(val, priv->base + GE_CF_CRC_STRIP_REG);
+
+	/* rx */
+	val = readl_relaxed(priv->base + GE_RECV_CONTROL_REG);
+	val |= BIT(3);			/* rx strip pad */
+	val |= BIT(4);			/* run pkt en */
+	writel_relaxed(val, priv->base + GE_RECV_CONTROL_REG);
+
+	/* auto neg control */
+	val = BIT(0);
+	writel_relaxed(val, priv->base + GE_TX_LOCAL_PAGE_REG);
+}
+
+static void hip04_mac_enable(struct net_device *ndev)
+{
+	struct hip04_priv *priv = netdev_priv(ndev);
+	u32 val;
+
+	/* enable tx & rx */
+	val = readl_relaxed(priv->base + GE_PORT_EN);
+	val |= BIT(1);		/* rx*/
+	val |= BIT(2);		/* tx*/
+	writel_relaxed(val, priv->base + GE_PORT_EN);
+
+	/* clear rx int */
+	val = RCV_INT;
+	writel_relaxed(val, priv->base + PPE_RINT);
+
+	/* config recv int*/
+	val = BIT(6);		/* int threshold 1 package */
+	val |= 0x4;		/* recv timeout */
+	writel_relaxed(val, priv->base + PPE_CFG_RX_PKT_INT);
+
+	/* enable interrupt */
+	priv->reg_inten = DEF_INT_MASK;
+	writel_relaxed(priv->reg_inten, priv->base + PPE_INTEN);
+}
+
+static void hip04_mac_disable(struct net_device *ndev)
+{
+	struct hip04_priv *priv = netdev_priv(ndev);
+	u32 val;
+
+	/* disable int */
+	priv->reg_inten &= ~(RCV_INT | RCV_NOBUF);
+	writel_relaxed(priv->reg_inten, priv->base + PPE_INTEN);
+
+	/* disable tx & rx */
+	val = readl_relaxed(priv->base + GE_PORT_EN);
+	val &= ~(BIT(1));	/* rx*/
+	val &= ~(BIT(2));	/* tx*/
+	writel_relaxed(val, priv->base + GE_PORT_EN);
+}
+
+static void hip04_set_xmit_desc(struct hip04_priv *priv, dma_addr_t phys)
+{
+	writel(phys, priv->base + PPE_CFG_TX_PKT_BD_ADDR);
+}
+
+static void hip04_set_recv_desc(struct hip04_priv *priv, dma_addr_t phys)
+{
+	regmap_write(priv->map, priv->port * 4 + PPE_CFG_RX_ADDR, phys);
+}
+
+static u32 hip04_recv_cnt(struct hip04_priv *priv)
+{
+	return readl(priv->base + PPE_HIS_RX_PKT_CNT);
+}
+
+static void hip04_update_mac_address(struct net_device *ndev)
+{
+	struct hip04_priv *priv = netdev_priv(ndev);
+
+	writel_relaxed(((ndev->dev_addr[0] << 8) | (ndev->dev_addr[1])),
+			priv->base + GE_STATION_MAC_ADDRESS);
+	writel_relaxed(((ndev->dev_addr[2] << 24) | (ndev->dev_addr[3] << 16) |
+			(ndev->dev_addr[4] << 8) | (ndev->dev_addr[5])),
+			priv->base + GE_STATION_MAC_ADDRESS + 4);
+}
+
+static int hip04_set_mac_address(struct net_device *ndev, void *addr)
+{
+	eth_mac_addr(ndev, addr);
+	hip04_update_mac_address(ndev);
+	return 0;
+}
+
+static void hip04_tx_reclaim(struct net_device *ndev, bool force)
+{
+	struct hip04_priv *priv = netdev_priv(ndev);
+	unsigned tx_head = priv->tx_head;
+	unsigned tx_tail = priv->tx_tail;
+	struct tx_desc *desc;
+
+	spin_lock_bh(&priv->lock);
+	while ((tx_tail != tx_head) || (priv->tx_count == TX_DESC_NUM)) {
+		if (priv->tx_phys[tx_tail]) {
+			dma_unmap_single(&ndev->dev, priv->tx_phys[tx_tail],
+				priv->tx_skb[tx_tail]->len, DMA_TO_DEVICE);
+			priv->tx_phys[tx_tail] = 0;
+		}
+		desc = &priv->tx_desc[priv->tx_tail];
+		if (desc->send_addr != 0) {
+			if (force)
+				desc->send_addr = 0;
+			else
+				break;
+		}
+		dev_kfree_skb(priv->tx_skb[tx_tail]);
+		priv->tx_skb[tx_tail] = NULL;
+		tx_tail = TX_NEXT(tx_tail);
+		priv->tx_count--;
+	}
+	priv->tx_tail = tx_tail;
+	spin_unlock_bh(&priv->lock);
+
+	if (priv->tx_count)
+		mod_timer(&priv->txtimer, jiffies + RECLAIM_PERIOD);
+
+	if (unlikely(netif_queue_stopped(ndev)) &&
+		(priv->tx_count < TX_DESC_NUM))
+		netif_wake_queue(ndev);
+}
+
+static void hip04_xmit_timer(unsigned long data)
+{
+	struct net_device *ndev = (void *)data;
+
+	hip04_tx_reclaim(ndev, false);
+}
+
+static int hip04_mac_start_xmit(struct sk_buff *skb, struct net_device *ndev)
+{
+	struct hip04_priv *priv = netdev_priv(ndev);
+	struct net_device_stats *stats = &ndev->stats;
+	unsigned int tx_head = priv->tx_head;
+	struct tx_desc *desc = &priv->tx_desc[tx_head];
+	dma_addr_t phys;
+
+	hip04_tx_reclaim(ndev, false);
+	mod_timer(&priv->txtimer, jiffies + RECLAIM_PERIOD);
+
+	if (priv->tx_count >= TX_DESC_NUM) {
+		netif_stop_queue(ndev);
+		return NETDEV_TX_BUSY;
+	}
+
+	phys = dma_map_single(&ndev->dev, skb->data, skb->len, DMA_TO_DEVICE);
+	if (dma_mapping_error(&ndev->dev, phys)) {
+		dev_kfree_skb(skb);
+		return NETDEV_TX_OK;
+	}
+
+	priv->tx_skb[tx_head] = skb;
+	priv->tx_phys[tx_head] = phys;
+	desc->send_addr = cpu_to_be32(phys);
+	desc->send_size = cpu_to_be16(skb->len);
+	desc->cfg = cpu_to_be32(DESC_DEF_CFG);
+	phys = priv->tx_desc_dma + tx_head * sizeof(struct tx_desc);
+	desc->wb_addr = cpu_to_be32(phys);
+	skb_tx_timestamp(skb);
+
+	/* Don't wait up for transmitted skbs to be freed. */
+	skb_orphan(skb);
+
+	hip04_set_xmit_desc(priv, phys);
+	priv->tx_head = TX_NEXT(tx_head);
+
+	stats->tx_bytes += skb->len;
+	stats->tx_packets++;
+	priv->tx_count++;
+
+	return NETDEV_TX_OK;
+}
+
+static int hip04_rx_poll(struct napi_struct *napi, int budget)
+{
+	struct hip04_priv *priv = container_of(napi, struct hip04_priv, napi);
+	struct net_device *ndev = priv->ndev;
+	struct net_device_stats *stats = &ndev->stats;
+	unsigned int cnt = hip04_recv_cnt(priv);
+	struct rx_desc *desc;
+	struct sk_buff *skb;
+	unsigned char *buf;
+	bool last = false;
+	dma_addr_t phys;
+	int rx = 0;
+	u16 len;
+	u32 err;
+
+	while (cnt && !last) {
+		buf = priv->rx_buf[priv->rx_head];
+		skb = build_skb(buf, priv->rx_buf_size);
+		if (unlikely(!skb))
+			net_dbg_ratelimited("build_skb failed\n");
+
+		dma_unmap_single(&ndev->dev, priv->rx_phys[priv->rx_head],
+				RX_BUF_SIZE, DMA_FROM_DEVICE);
+		priv->rx_phys[priv->rx_head] = 0;
+
+		desc = (struct rx_desc *)skb->data;
+		len = be16_to_cpu(desc->pkt_len);
+		err = be32_to_cpu(desc->pkt_err);
+
+		if (len > RX_BUF_SIZE)
+			len = RX_BUF_SIZE;
+
+		if (0 == len) {
+			dev_kfree_skb_any(skb);
+			last = true;
+		} else if (err & RX_PKT_ERR) {
+			dev_kfree_skb_any(skb);
+			stats->rx_dropped++;
+			stats->rx_errors++;
+		} else {
+			skb_reserve(skb, NET_SKB_PAD + NET_IP_ALIGN);
+			skb_put(skb, len);
+			skb->protocol = eth_type_trans(skb, ndev);
+			napi_gro_receive(&priv->napi, skb);
+			stats->rx_packets++;
+			stats->rx_bytes += len;
+			rx++;
+		}
+
+		buf = netdev_alloc_frag(priv->rx_buf_size);
+		if (!buf)
+			return -ENOMEM;
+		phys = dma_map_single(&ndev->dev, buf,
+				RX_BUF_SIZE, DMA_FROM_DEVICE);
+		if (dma_mapping_error(&ndev->dev, phys))
+			return -EIO;
+		priv->rx_buf[priv->rx_head] = buf;
+		priv->rx_phys[priv->rx_head] = phys;
+		hip04_set_recv_desc(priv, phys);
+
+		priv->rx_head = RX_NEXT(priv->rx_head);
+		if (rx >= budget)
+			break;
+
+		if (--cnt == 0)
+			cnt = hip04_recv_cnt(priv);
+	}
+
+	if (rx < budget) {
+		napi_complete(napi);
+
+		/* enable rx interrupt */
+		priv->reg_inten |= RCV_INT | RCV_NOBUF;
+		writel_relaxed(priv->reg_inten, priv->base + PPE_INTEN);
+	}
+
+	return rx;
+}
+
+static irqreturn_t hip04_mac_interrupt(int irq, void *dev_id)
+{
+	struct net_device *ndev = (struct net_device *) dev_id;
+	struct hip04_priv *priv = netdev_priv(ndev);
+	u32 ists = readl_relaxed(priv->base + PPE_INTSTS);
+
+	writel_relaxed(DEF_INT_MASK, priv->base + PPE_RINT);
+
+	if (ists & (RCV_INT | RCV_NOBUF)) {
+		/* disable rx interrupt */
+		priv->reg_inten &= ~(RCV_INT | RCV_NOBUF);
+		writel_relaxed(priv->reg_inten, priv->base + PPE_INTEN);
+		napi_schedule(&priv->napi);
+	}
+
+	return IRQ_HANDLED;
+}
+
+static void hip04_adjust_link(struct net_device *ndev)
+{
+	struct hip04_priv *priv = netdev_priv(ndev);
+	struct phy_device *phy = priv->phy;
+
+	if ((priv->speed != phy->speed) || (priv->duplex != phy->duplex)) {
+		hip04_config_port(ndev, phy->speed, phy->duplex);
+		phy_print_status(phy);
+	}
+}
+
+static int hip04_mac_open(struct net_device *ndev)
+{
+	struct hip04_priv *priv = netdev_priv(ndev);
+	int i;
+
+	priv->rx_head = 0;
+	priv->tx_head = 0;
+	priv->tx_tail = 0;
+	priv->tx_count = 0;
+
+	hip04_reset_ppe(priv);
+	for (i = 0; i < RX_DESC_NUM; i++) {
+		dma_addr_t phys;
+
+		phys = dma_map_single(&ndev->dev, priv->rx_buf[i],
+				RX_BUF_SIZE, DMA_FROM_DEVICE);
+		if (dma_mapping_error(&ndev->dev, phys))
+			return -EIO;
+
+		priv->rx_phys[i] = phys;
+		hip04_set_recv_desc(priv, phys);
+	}
+
+	if (priv->phy)
+		phy_start(priv->phy);
+
+	netif_start_queue(ndev);
+	hip04_mac_enable(ndev);
+	napi_enable(&priv->napi);
+
+	return 0;
+}
+
+static int hip04_mac_stop(struct net_device *ndev)
+{
+	struct hip04_priv *priv = netdev_priv(ndev);
+	int i;
+
+	napi_disable(&priv->napi);
+	netif_stop_queue(ndev);
+	hip04_mac_disable(ndev);
+	hip04_tx_reclaim(ndev, true);
+	hip04_reset_ppe(priv);
+
+	if (priv->phy)
+		phy_stop(priv->phy);
+
+	for (i = 0; i < RX_DESC_NUM; i++) {
+		if (priv->rx_phys[i]) {
+			dma_unmap_single(&ndev->dev, priv->rx_phys[i],
+					RX_BUF_SIZE, DMA_FROM_DEVICE);
+			priv->rx_phys[i] = 0;
+		}
+	}
+
+	return 0;
+}
+
+static void hip04_timeout(struct net_device *ndev)
+{
+	hip04_mac_stop(ndev);
+	hip04_mac_open(ndev);
+	return;
+}
+
+static struct net_device_ops hip04_netdev_ops = {
+	.ndo_open		= hip04_mac_open,
+	.ndo_stop		= hip04_mac_stop,
+	.ndo_start_xmit		= hip04_mac_start_xmit,
+	.ndo_set_mac_address	= hip04_set_mac_address,
+	.ndo_tx_timeout         = hip04_timeout,
+	.ndo_validate_addr	= eth_validate_addr,
+	.ndo_change_mtu		= eth_change_mtu,
+};
+
+static int hip04_alloc_ring(struct net_device *ndev, struct device *d)
+{
+	struct hip04_priv *priv = netdev_priv(ndev);
+	int i;
+
+	priv->tx_desc = dma_alloc_coherent(d,
+			TX_DESC_NUM * sizeof(struct tx_desc),
+			&priv->tx_desc_dma, GFP_KERNEL);
+	if (!priv->tx_desc)
+		return -ENOMEM;
+
+	priv->rx_buf_size = RX_BUF_SIZE +
+			    SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
+	for (i = 0; i < RX_DESC_NUM; i++) {
+		priv->rx_buf[i] = netdev_alloc_frag(priv->rx_buf_size);
+		if (!priv->rx_buf[i])
+			return -ENOMEM;
+	}
+
+	return 0;
+}
+
+static void hip04_free_ring(struct net_device *ndev, struct device *d)
+{
+	struct hip04_priv *priv = netdev_priv(ndev);
+	int i;
+
+	for (i = 0; i < RX_DESC_NUM; i++)
+		if (priv->rx_buf[i])
+			put_page(virt_to_head_page(priv->rx_buf[i]));
+
+	for (i = 0; i < TX_DESC_NUM; i++)
+		if (priv->tx_skb[i])
+			dev_kfree_skb_any(priv->tx_skb[i]);
+
+	dma_free_coherent(d, TX_DESC_NUM * sizeof(struct tx_desc),
+			priv->tx_desc, priv->tx_desc_dma);
+}
+
+static int hip04_mac_probe(struct platform_device *pdev)
+{
+	struct device *d = &pdev->dev;
+	struct device_node *node = d->of_node;
+	struct of_phandle_args arg;
+	struct net_device *ndev;
+	struct hip04_priv *priv;
+	struct resource *res;
+	unsigned int irq;
+	int ret;
+
+	ndev = alloc_etherdev(sizeof(struct hip04_priv));
+	if (!ndev)
+		return -ENOMEM;
+
+	priv = netdev_priv(ndev);
+	priv->ndev = ndev;
+	platform_set_drvdata(pdev, ndev);
+
+	res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
+	priv->base = devm_ioremap_resource(d, res);
+	if (IS_ERR(priv->base)) {
+		ret = PTR_ERR(priv->base);
+		goto init_fail;
+	}
+
+	ret = of_parse_phandle_with_fixed_args(node, "port-handle", 2, 0, &arg);
+	if (ret < 0) {
+		dev_warn(d, "no port-handle\n");
+		goto init_fail;
+	}
+
+	priv->port = arg.args[0];
+	priv->chan = arg.args[1];
+
+	priv->map = syscon_node_to_regmap(arg.np);
+	if (IS_ERR(priv->map)) {
+		dev_warn(d, "no syscon hisilicon,hip04-ppe\n");
+		ret = PTR_ERR(priv->map);
+		goto init_fail;
+	}
+
+	priv->phy_mode = of_get_phy_mode(node);
+	if (priv->phy_mode < 0) {
+		dev_warn(d, "not find phy-mode\n");
+		ret = -EINVAL;
+		goto init_fail;
+	}
+
+	irq = platform_get_irq(pdev, 0);
+	if (irq <= 0) {
+		ret = -EINVAL;
+		goto init_fail;
+	}
+
+	ether_setup(ndev);
+	ndev->netdev_ops = &hip04_netdev_ops;
+	ndev->watchdog_timeo = TX_TIMEOUT;
+	ndev->priv_flags |= IFF_UNICAST_FLT;
+	ndev->irq = irq;
+	netif_napi_add(ndev, &priv->napi, hip04_rx_poll, RX_DESC_NUM);
+	SET_NETDEV_DEV(ndev, &pdev->dev);
+
+	hip04_reset_ppe(priv);
+	if (priv->phy_mode == PHY_INTERFACE_MODE_MII)
+		hip04_config_port(ndev, SPEED_100, DUPLEX_FULL);
+
+	hip04_config_fifo(priv);
+	random_ether_addr(ndev->dev_addr);
+	hip04_update_mac_address(ndev);
+
+	ret = hip04_alloc_ring(ndev, d);
+	if (ret) {
+		netdev_err(ndev, "alloc ring fail\n");
+		goto alloc_fail;
+	}
+
+	ret = devm_request_irq(d, irq, hip04_mac_interrupt,
+				0, pdev->name, ndev);
+	if (ret) {
+		netdev_err(ndev, "devm_request_irq failed\n");
+		goto alloc_fail;
+	}
+
+	priv->phy_node = of_parse_phandle(node, "phy-handle", 0);
+	if (priv->phy_node) {
+		priv->phy = of_phy_connect(ndev, priv->phy_node,
+			&hip04_adjust_link, 0, priv->phy_mode);
+		if (!priv->phy)
+			return -EPROBE_DEFER;
+	}
+
+	setup_timer(&priv->txtimer, hip04_xmit_timer, (unsigned long) ndev);
+	ret = register_netdev(ndev);
+	if (ret) {
+		free_netdev(ndev);
+		goto alloc_fail;
+	}
+
+	return 0;
+
+alloc_fail:
+	hip04_free_ring(ndev, d);
+init_fail:
+	of_node_put(priv->phy_node);
+	free_netdev(ndev);
+	return ret;
+}
+
+static int hip04_remove(struct platform_device *pdev)
+{
+	struct net_device *ndev = platform_get_drvdata(pdev);
+	struct hip04_priv *priv = netdev_priv(ndev);
+	struct device *d = &pdev->dev;
+
+	if (priv->phy)
+		phy_disconnect(priv->phy);
+
+	del_timer_sync(&priv->txtimer);
+	hip04_free_ring(ndev, d);
+	unregister_netdev(ndev);
+	free_irq(ndev->irq, ndev);
+	of_node_put(priv->phy_node);
+	free_netdev(ndev);
+
+	return 0;
+}
+
+static const struct of_device_id hip04_mac_match[] = {
+	{ .compatible = "hisilicon,hip04-mac" },
+	{ }
+};
+
+static struct platform_driver hip04_mac_driver = {
+	.probe	= hip04_mac_probe,
+	.remove	= hip04_remove,
+	.driver	= {
+		.name		= DRV_NAME,
+		.owner		= THIS_MODULE,
+		.of_match_table	= hip04_mac_match,
+	},
+};
+module_platform_driver(hip04_mac_driver);
+
+MODULE_DESCRIPTION("HISILICON P04 Ethernet driver");
+MODULE_LICENSE("GPL v2");
+MODULE_ALIAS("platform:hip04-ether");
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 148+ messages in thread

* Re: [PATCH 3/3] net: hisilicon: new hip04 ethernet driver
  2014-03-25 17:54                   ` Arnd Bergmann
@ 2014-03-27 12:53                     ` zhangfei
  -1 siblings, 0 replies; 148+ messages in thread
From: zhangfei @ 2014-03-27 12:53 UTC (permalink / raw)
  To: Arnd Bergmann, Eric Dumazet
  Cc: Florian Fainelli, Zhangfei Gao, linux-arm-kernel, Mark Rutland,
	devicetree, Russell King - ARM Linux, Sergei Shtylyov, netdev,
	David S. Miller



On 03/26/2014 01:54 AM, Arnd Bergmann wrote:
> On Tuesday 25 March 2014 10:21:42 Eric Dumazet wrote:
>> On Tue, 2014-03-25 at 18:05 +0100, Arnd Bergmann wrote:
>>> On Tuesday 25 March 2014 10:00:30 Florian Fainelli wrote:
>>>
>>>> Using a timer to ensure completion of TX packets is a trick that
>>>> worked in the past, but now that the networking stack got smarter,
>>>> this might artificially increase the processing time of packets in the
>>>> transmit path, and this will defeat features like TCP small queues
>>>> etc.. as could be seen with the mvneta driver [1]. The best way really
>>>> is to rely on TX completion interrupts when those exist as they cannot
>>>> lie about the hardware status (in theory) and they should provide the
>>>> fastest way to complete TX packets.
>>>
>>> By as Zhangfei Gao pointed out, this hardware does not have a working
>>> TX completion interrupt. Using timers to do this has always just been
>>> a workaround for broken hardware IMHO.
>>
>> For this kind of drivers, calling skb_orphan() from ndo_start_xmit() is
>> mandatory.
>
> Cool, thanks for the information, I was wondering already if there was
> a way to deal with hardware like this.
>

That's great,
In the experiment, keeping reclaim in the ndo_start_xmit always get the 
best throughput, also simpler, even no requirement of spin_lock.

By the way, still have confusion about build_skb.

At first, I thought we can malloc n*buffers as a ring and keep using 
them for dma, every time when packet coming, using build_skb adds a 
head, send to upper layer.
After data is consumed, we can continue reuse the buffer next time.

However, in the iperf stress test, always error happen.
The buffer is released in fact, and we need alloc new buffer for the 
next transfer.

So the build_skb is not used for reusing buffers, but only for keeping 
hot data in cache, right?

Thanks

^ permalink raw reply	[flat|nested] 148+ messages in thread

* [PATCH 3/3] net: hisilicon: new hip04 ethernet driver
@ 2014-03-27 12:53                     ` zhangfei
  0 siblings, 0 replies; 148+ messages in thread
From: zhangfei @ 2014-03-27 12:53 UTC (permalink / raw)
  To: linux-arm-kernel



On 03/26/2014 01:54 AM, Arnd Bergmann wrote:
> On Tuesday 25 March 2014 10:21:42 Eric Dumazet wrote:
>> On Tue, 2014-03-25 at 18:05 +0100, Arnd Bergmann wrote:
>>> On Tuesday 25 March 2014 10:00:30 Florian Fainelli wrote:
>>>
>>>> Using a timer to ensure completion of TX packets is a trick that
>>>> worked in the past, but now that the networking stack got smarter,
>>>> this might artificially increase the processing time of packets in the
>>>> transmit path, and this will defeat features like TCP small queues
>>>> etc.. as could be seen with the mvneta driver [1]. The best way really
>>>> is to rely on TX completion interrupts when those exist as they cannot
>>>> lie about the hardware status (in theory) and they should provide the
>>>> fastest way to complete TX packets.
>>>
>>> By as Zhangfei Gao pointed out, this hardware does not have a working
>>> TX completion interrupt. Using timers to do this has always just been
>>> a workaround for broken hardware IMHO.
>>
>> For this kind of drivers, calling skb_orphan() from ndo_start_xmit() is
>> mandatory.
>
> Cool, thanks for the information, I was wondering already if there was
> a way to deal with hardware like this.
>

That's great,
In the experiment, keeping reclaim in the ndo_start_xmit always get the 
best throughput, also simpler, even no requirement of spin_lock.

By the way, still have confusion about build_skb.

At first, I thought we can malloc n*buffers as a ring and keep using 
them for dma, every time when packet coming, using build_skb adds a 
head, send to upper layer.
After data is consumed, we can continue reuse the buffer next time.

However, in the iperf stress test, always error happen.
The buffer is released in fact, and we need alloc new buffer for the 
next transfer.

So the build_skb is not used for reusing buffers, but only for keeping 
hot data in cache, right?

Thanks

^ permalink raw reply	[flat|nested] 148+ messages in thread

* Re: [PATCH 3/3] net: hisilicon: new hip04 ethernet driver
  2014-03-24 16:32     ` Florian Fainelli
@ 2014-03-27  6:27       ` Zhangfei Gao
  -1 siblings, 0 replies; 148+ messages in thread
From: Zhangfei Gao @ 2014-03-27  6:27 UTC (permalink / raw)
  To: Florian Fainelli
  Cc: Zhangfei Gao, Mark Rutland, devicetree, Russell King,
	Sergei Shtylyov, Arnd Bergmann, netdev, David Miller,
	linux-arm-kernel

Dear Florian

Thanks for the kind suggestion.

On Tue, Mar 25, 2014 at 12:32 AM, Florian Fainelli <f.fainelli@gmail.com> wrote:
> 2014-03-24 7:14 GMT-07:00 Zhangfei Gao <zhangfei.gao@linaro.org>:
>> Support Hisilicon hip04 ethernet driver, including 100M / 1000M controller
>>
>> Signed-off-by: Zhangfei Gao <zhangfei.gao@linaro.org>
>> ---
>>  drivers/net/ethernet/hisilicon/Makefile    |    2 +-
>>  drivers/net/ethernet/hisilicon/hip04_eth.c |  728 ++++++++++++++++++++++++++++
>>  2 files changed, 729 insertions(+), 1 deletion(-)
>>  create mode 100644 drivers/net/ethernet/hisilicon/hip04_eth.c
>
> [snip]
>
>> +static void hip04_config_port(struct hip04_priv *priv, u32 speed, u32 duplex)
>> +{
>> +       u32 val;
>> +
>> +       priv->speed = speed;
>> +       priv->duplex = duplex;
>> +
>> +       switch (priv->phy_mode) {
>> +       case PHY_INTERFACE_MODE_SGMII:
>> +               if (speed == SPEED_1000)
>> +                       val = 8;
>> +               else
>> +                       val = 7;
>> +               break;
>> +       case PHY_INTERFACE_MODE_MII:
>> +               val = 1;        /* SPEED_100 */
>> +               break;
>> +       default:
>> +               val = 0;
>> +               break;
>
> Is 0 valid for e.g: 10Mbits/sec, regardless of the phy_mode?

0 is only 10M for MII mode, will add warning here.

        switch (priv->phy_mode) {
        case PHY_INTERFACE_MODE_SGMII:
                if (speed == SPEED_1000)
                        val = 8;
                else if (speed == SPEED_100)
                        val = 7;
                else
                        val = 6;        /* SPEED_10 */
                break;
        case PHY_INTERFACE_MODE_MII:
                if (speed == SPEED_100)
                        val = 1;
                else
                        val = 0;        /* SPEED_10 */
                break;
        default:
                netdev_warn(ndev, "not supported mode\n");
                val = 0;
                break;
        }

>> +
>> +static void hip04_mac_enable(struct net_device *ndev, bool enable)
>> +{
>> +       struct hip04_priv *priv = netdev_priv(ndev);
>> +       u32 val;
>> +
>> +       if (enable) {
>> +               /* enable tx & rx */
>> +               val = readl_relaxed(priv->base + GE_PORT_EN);
>> +               val |= BIT(1);          /* rx*/
>> +               val |= BIT(2);          /* tx*/
>> +               writel_relaxed(val, priv->base + GE_PORT_EN);
>> +
>> +               /* enable interrupt */
>> +               priv->reg_inten = DEF_INT_MASK;
>> +               writel_relaxed(priv->reg_inten, priv->base + PPE_INTEN);
>> +
>> +               /* clear rx int */
>> +               val = RCV_INT;
>> +               writel_relaxed(val, priv->base + PPE_RINT);
>
> Should not you first clear the interrupt and then DEF_INT_MASK? Why is
OK, got it.

> there a RCV_INT applied to PPE_RINT register in the enable path, but
> there is no such thing in the "disable" branch of your function?

This required here since setting the following cmd, /* config recv int*/
Otherwise, the setting does not take effect.

>
>> +
>> +               /* config recv int*/
>> +               val = BIT(6);           /* int threshold 1 package */
>> +               val |= 0x4;             /* recv timeout */
>> +               writel_relaxed(val, priv->base + PPE_CFG_RX_PKT_INT);
>> +       } else {
>> +               /* disable int */
>> +               priv->reg_inten &= ~(RCV_INT | RCV_NOBUF);
>> +               writel_relaxed(priv->reg_inten, priv->base + PPE_INTEN);
>> +
>> +               /* disable tx & rx */
>> +               val = readl_relaxed(priv->base + GE_PORT_EN);
>> +               val &= ~(BIT(1));       /* rx*/
>> +               val &= ~(BIT(2));       /* tx*/
>> +               writel_relaxed(val, priv->base + GE_PORT_EN);
>> +       }
>
> There is little to no sharing between the two branches, I would have
> created separate helper functions for the enable/disable logic.
OK, got it.

>
>> +}
>> +
>> +static void hip04_set_xmit_desc(struct hip04_priv *priv, dma_addr_t phys)
>> +{
>> +       writel(phys, priv->base + PPE_CFG_TX_PKT_BD_ADDR);
>
> This is not 64-bits/LPAE safe, do you have a High address part and a
> Low address part for your address in the buffer descriptor address, if
> so, better use it now.

Unfortunately it is true, only 32bytes for this controller on A15.
Bits [33:32] of desc can be set in [5:4], but it may be ignored,
RX register is only have 32bits too.
So the controller is only for 32 bits.

The next version can be used on 64bits, and there is high address part.
Still not get spec yet.

>> +
>> +static int hip04_rx_poll(struct napi_struct *napi, int budget)
>> +{
>> +       struct hip04_priv *priv = container_of(napi, struct hip04_priv, napi);
>> +       struct net_device *ndev = priv->ndev;
>> +       struct net_device_stats *stats = &ndev->stats;
>> +       unsigned int cnt = hip04_recv_cnt(priv);
>> +       struct sk_buff *skb;
>> +       struct rx_desc *desc;
>> +       unsigned char *buf;
>> +       dma_addr_t phys;
>> +       int rx = 0;
>> +       u16 len;
>> +       u32 err;
>> +
>> +       while (cnt) {
>> +               buf = priv->rx_buf[priv->rx_head];
>> +               skb = build_skb(buf, priv->rx_buf_size);
>> +               if (unlikely(!skb))
>> +                       net_dbg_ratelimited("build_skb failed\n");
>> +
>> +               dma_unmap_single(&ndev->dev, priv->rx_phys[priv->rx_head],
>> +                               RX_BUF_SIZE, DMA_FROM_DEVICE);
>> +               priv->rx_phys[priv->rx_head] = 0;
>> +
>> +               desc = (struct rx_desc *)skb->data;
>> +               len = be16_to_cpu(desc->pkt_len);
>> +               err = be32_to_cpu(desc->pkt_err);
>> +
>> +               if (len > RX_BUF_SIZE)
>> +                       len = RX_BUF_SIZE;
>> +               if (0 == len)
>> +                       break;
>
> Should not this be a continue? This is an error packet, so you should
> keep on processing the others, or does this have a special meaning?
len=0 indicates the last packet.
Will change the behavior here.
   if (0 == len) {
                        dev_kfree_skb_any(skb);
                        last = true;
                } else if (err & RX_PKT_ERR) {
                        dev_kfree_skb_any(skb);
                        stats->rx_dropped++;
                        stats->rx_errors++;
                } else {
                        skb_reserve(skb, NET_SKB_PAD + NET_IP_ALIGN);
                        skb_put(skb, len);
                        skb->protocol = eth_type_trans(skb, ndev);
                        napi_gro_receive(&priv->napi, skb);
                        stats->rx_packets++;
                        stats->rx_bytes += len;
                }
>
>> +
>> +               if (err & RX_PKT_ERR) {
>> +                       dev_kfree_skb_any(skb);
>> +                       stats->rx_dropped++;
>> +                       stats->rx_errors++;
>> +                       continue;
>> +               }
>> +
>> +               stats->rx_packets++;
>> +               stats->rx_bytes += len;
>> +
>> +               skb_reserve(skb, NET_SKB_PAD + NET_IP_ALIGN);
>> +               skb_put(skb, len);
>> +               skb->protocol = eth_type_trans(skb, ndev);
>> +               napi_gro_receive(&priv->napi, skb);
>> +
>> +               buf = netdev_alloc_frag(priv->rx_buf_size);
>> +               if (!buf)
>> +                       return -ENOMEM;
>> +               phys = dma_map_single(&ndev->dev, buf,
>> +                               RX_BUF_SIZE, DMA_FROM_DEVICE);
>
> Missing dma_mapping_error() check here.
Yes, thanks

>
>> +               priv->rx_buf[priv->rx_head] = buf;
>> +               priv->rx_phys[priv->rx_head] = phys;
>> +               hip04_set_recv_desc(priv, phys);
>> +
>> +               priv->rx_head = RX_NEXT(priv->rx_head);
>> +               if (rx++ >= budget)
>> +                       break;
>> +
>> +               if (--cnt == 0)
>> +                       cnt = hip04_recv_cnt(priv);
>
>> +       }
>> +
>> +       if (rx < budget) {
>> +               napi_complete(napi);
>> +
>> +               /* enable rx interrupt */
>> +               priv->reg_inten |= RCV_INT | RCV_NOBUF;
>> +               writel_relaxed(priv->reg_inten, priv->base + PPE_INTEN);
>> +       }
>> +
>> +       return rx;
>> +}
>> +
>> +static irqreturn_t hip04_mac_interrupt(int irq, void *dev_id)
>> +{
>> +       struct net_device *ndev = (struct net_device *) dev_id;
>> +       struct hip04_priv *priv = netdev_priv(ndev);
>> +       u32 ists = readl_relaxed(priv->base + PPE_INTSTS);
>> +       u32 val = DEF_INT_MASK;
>> +
>> +       writel_relaxed(val, priv->base + PPE_RINT);
>> +
>> +       if (ists & (RCV_INT | RCV_NOBUF)) {
>> +               if (napi_schedule_prep(&priv->napi)) {
>> +                       /* disable rx interrupt */
>> +                       priv->reg_inten &= ~(RCV_INT | RCV_NOBUF);
>> +                       writel_relaxed(priv->reg_inten, priv->base + PPE_INTEN);
>> +                       __napi_schedule(&priv->napi);
>> +               }
>> +       }
>
> You should also process TX completion interrupts here
There is no such interrupt.
>
>> +
>> +       return IRQ_HANDLED;
>> +}
>> +
>> +static void hip04_tx_reclaim(struct net_device *ndev, bool force)
>> +{
>> +       struct hip04_priv *priv = netdev_priv(ndev);
>> +       unsigned tx_head = priv->tx_head;
>> +       unsigned tx_tail = priv->tx_tail;
>> +       struct tx_desc *desc = &priv->tx_desc[priv->tx_tail];
>> +
>> +       while (tx_tail != tx_head) {
>> +               if (desc->send_addr != 0) {
>> +                       if (force)
>> +                               desc->send_addr = 0;
>> +                       else
>> +                               break;
>> +               }
>> +               if (priv->tx_phys[tx_tail]) {
>> +                       dma_unmap_single(&ndev->dev, priv->tx_phys[tx_tail],
>> +                               priv->tx_skb[tx_tail]->len, DMA_TO_DEVICE);
>> +                       priv->tx_phys[tx_tail] = 0;
>> +               }
>> +               dev_kfree_skb_irq(priv->tx_skb[tx_tail]);
>
> dev_kfree_skb_irq() bypasses all sort of SKB tracking, you might want
> to use kfree_skb() here instead.
OK, will use dev_kfree_skb instead.

>
>> +               priv->tx_skb[tx_tail] = NULL;
>> +               tx_tail = TX_NEXT(tx_tail);
>> +               priv->tx_count--;
>> +       }
>> +       priv->tx_tail = tx_tail;
>> +}
>> +
>> +static int hip04_mac_start_xmit(struct sk_buff *skb, struct net_device *ndev)
>> +{
>> +       struct hip04_priv *priv = netdev_priv(ndev);
>> +       struct net_device_stats *stats = &ndev->stats;
>> +       unsigned int tx_head = priv->tx_head;
>> +       struct tx_desc *desc = &priv->tx_desc[tx_head];
>> +       dma_addr_t phys;
>> +
>> +       hip04_tx_reclaim(ndev, false);
>> +
>> +       if (priv->tx_count++ >= TX_DESC_NUM) {
>> +               net_dbg_ratelimited("no TX space for packet\n");
>> +               netif_stop_queue(ndev);
>> +               return NETDEV_TX_BUSY;
>> +       }
>> +
>> +       phys = dma_map_single(&ndev->dev, skb->data, skb->len, DMA_TO_DEVICE);
>
> Missing dma_mapping_error() check here
>
>> +       priv->tx_skb[tx_head] = skb;
>> +       priv->tx_phys[tx_head] = phys;
>> +       desc->send_addr = cpu_to_be32(phys);
>> +       desc->send_size = cpu_to_be16(skb->len);
>> +       desc->cfg = cpu_to_be32(DESC_DEF_CFG);
>> +       phys = priv->tx_desc_dma + tx_head * sizeof(struct tx_desc);
>> +       desc->wb_addr = cpu_to_be32(phys);
>
> Don't we need a barrier here to ensure that all stores are completed
> before we hand this descriptor address to hip40_set_xmit_desc() which
> should make DMA start processing it?
>
>> +       skb_tx_timestamp(skb);
>> +       hip04_set_xmit_desc(priv, phys);
>> +       priv->tx_head = TX_NEXT(tx_head);
>> +
>> +       stats->tx_bytes += skb->len;
>> +       stats->tx_packets++;
>
> You cannot update the transmit stats here, what start_xmit() does it
> just queue packets for the DMA engine to process them, but that does
> not mean DMA has completed those. You should update statistics in the
> tx_reclaim() function.
Yes, however, since no TX complete interrupt, tx_reclaim may be called
rather late, it may be more suitable to put here.

Thanks

^ permalink raw reply	[flat|nested] 148+ messages in thread

* [PATCH 3/3] net: hisilicon: new hip04 ethernet driver
@ 2014-03-27  6:27       ` Zhangfei Gao
  0 siblings, 0 replies; 148+ messages in thread
From: Zhangfei Gao @ 2014-03-27  6:27 UTC (permalink / raw)
  To: linux-arm-kernel

Dear Florian

Thanks for the kind suggestion.

On Tue, Mar 25, 2014 at 12:32 AM, Florian Fainelli <f.fainelli@gmail.com> wrote:
> 2014-03-24 7:14 GMT-07:00 Zhangfei Gao <zhangfei.gao@linaro.org>:
>> Support Hisilicon hip04 ethernet driver, including 100M / 1000M controller
>>
>> Signed-off-by: Zhangfei Gao <zhangfei.gao@linaro.org>
>> ---
>>  drivers/net/ethernet/hisilicon/Makefile    |    2 +-
>>  drivers/net/ethernet/hisilicon/hip04_eth.c |  728 ++++++++++++++++++++++++++++
>>  2 files changed, 729 insertions(+), 1 deletion(-)
>>  create mode 100644 drivers/net/ethernet/hisilicon/hip04_eth.c
>
> [snip]
>
>> +static void hip04_config_port(struct hip04_priv *priv, u32 speed, u32 duplex)
>> +{
>> +       u32 val;
>> +
>> +       priv->speed = speed;
>> +       priv->duplex = duplex;
>> +
>> +       switch (priv->phy_mode) {
>> +       case PHY_INTERFACE_MODE_SGMII:
>> +               if (speed == SPEED_1000)
>> +                       val = 8;
>> +               else
>> +                       val = 7;
>> +               break;
>> +       case PHY_INTERFACE_MODE_MII:
>> +               val = 1;        /* SPEED_100 */
>> +               break;
>> +       default:
>> +               val = 0;
>> +               break;
>
> Is 0 valid for e.g: 10Mbits/sec, regardless of the phy_mode?

0 is only 10M for MII mode, will add warning here.

        switch (priv->phy_mode) {
        case PHY_INTERFACE_MODE_SGMII:
                if (speed == SPEED_1000)
                        val = 8;
                else if (speed == SPEED_100)
                        val = 7;
                else
                        val = 6;        /* SPEED_10 */
                break;
        case PHY_INTERFACE_MODE_MII:
                if (speed == SPEED_100)
                        val = 1;
                else
                        val = 0;        /* SPEED_10 */
                break;
        default:
                netdev_warn(ndev, "not supported mode\n");
                val = 0;
                break;
        }

>> +
>> +static void hip04_mac_enable(struct net_device *ndev, bool enable)
>> +{
>> +       struct hip04_priv *priv = netdev_priv(ndev);
>> +       u32 val;
>> +
>> +       if (enable) {
>> +               /* enable tx & rx */
>> +               val = readl_relaxed(priv->base + GE_PORT_EN);
>> +               val |= BIT(1);          /* rx*/
>> +               val |= BIT(2);          /* tx*/
>> +               writel_relaxed(val, priv->base + GE_PORT_EN);
>> +
>> +               /* enable interrupt */
>> +               priv->reg_inten = DEF_INT_MASK;
>> +               writel_relaxed(priv->reg_inten, priv->base + PPE_INTEN);
>> +
>> +               /* clear rx int */
>> +               val = RCV_INT;
>> +               writel_relaxed(val, priv->base + PPE_RINT);
>
> Should not you first clear the interrupt and then DEF_INT_MASK? Why is
OK, got it.

> there a RCV_INT applied to PPE_RINT register in the enable path, but
> there is no such thing in the "disable" branch of your function?

This required here since setting the following cmd, /* config recv int*/
Otherwise, the setting does not take effect.

>
>> +
>> +               /* config recv int*/
>> +               val = BIT(6);           /* int threshold 1 package */
>> +               val |= 0x4;             /* recv timeout */
>> +               writel_relaxed(val, priv->base + PPE_CFG_RX_PKT_INT);
>> +       } else {
>> +               /* disable int */
>> +               priv->reg_inten &= ~(RCV_INT | RCV_NOBUF);
>> +               writel_relaxed(priv->reg_inten, priv->base + PPE_INTEN);
>> +
>> +               /* disable tx & rx */
>> +               val = readl_relaxed(priv->base + GE_PORT_EN);
>> +               val &= ~(BIT(1));       /* rx*/
>> +               val &= ~(BIT(2));       /* tx*/
>> +               writel_relaxed(val, priv->base + GE_PORT_EN);
>> +       }
>
> There is little to no sharing between the two branches, I would have
> created separate helper functions for the enable/disable logic.
OK, got it.

>
>> +}
>> +
>> +static void hip04_set_xmit_desc(struct hip04_priv *priv, dma_addr_t phys)
>> +{
>> +       writel(phys, priv->base + PPE_CFG_TX_PKT_BD_ADDR);
>
> This is not 64-bits/LPAE safe, do you have a High address part and a
> Low address part for your address in the buffer descriptor address, if
> so, better use it now.

Unfortunately it is true, only 32bytes for this controller on A15.
Bits [33:32] of desc can be set in [5:4], but it may be ignored,
RX register is only have 32bits too.
So the controller is only for 32 bits.

The next version can be used on 64bits, and there is high address part.
Still not get spec yet.

>> +
>> +static int hip04_rx_poll(struct napi_struct *napi, int budget)
>> +{
>> +       struct hip04_priv *priv = container_of(napi, struct hip04_priv, napi);
>> +       struct net_device *ndev = priv->ndev;
>> +       struct net_device_stats *stats = &ndev->stats;
>> +       unsigned int cnt = hip04_recv_cnt(priv);
>> +       struct sk_buff *skb;
>> +       struct rx_desc *desc;
>> +       unsigned char *buf;
>> +       dma_addr_t phys;
>> +       int rx = 0;
>> +       u16 len;
>> +       u32 err;
>> +
>> +       while (cnt) {
>> +               buf = priv->rx_buf[priv->rx_head];
>> +               skb = build_skb(buf, priv->rx_buf_size);
>> +               if (unlikely(!skb))
>> +                       net_dbg_ratelimited("build_skb failed\n");
>> +
>> +               dma_unmap_single(&ndev->dev, priv->rx_phys[priv->rx_head],
>> +                               RX_BUF_SIZE, DMA_FROM_DEVICE);
>> +               priv->rx_phys[priv->rx_head] = 0;
>> +
>> +               desc = (struct rx_desc *)skb->data;
>> +               len = be16_to_cpu(desc->pkt_len);
>> +               err = be32_to_cpu(desc->pkt_err);
>> +
>> +               if (len > RX_BUF_SIZE)
>> +                       len = RX_BUF_SIZE;
>> +               if (0 == len)
>> +                       break;
>
> Should not this be a continue? This is an error packet, so you should
> keep on processing the others, or does this have a special meaning?
len=0 indicates the last packet.
Will change the behavior here.
   if (0 == len) {
                        dev_kfree_skb_any(skb);
                        last = true;
                } else if (err & RX_PKT_ERR) {
                        dev_kfree_skb_any(skb);
                        stats->rx_dropped++;
                        stats->rx_errors++;
                } else {
                        skb_reserve(skb, NET_SKB_PAD + NET_IP_ALIGN);
                        skb_put(skb, len);
                        skb->protocol = eth_type_trans(skb, ndev);
                        napi_gro_receive(&priv->napi, skb);
                        stats->rx_packets++;
                        stats->rx_bytes += len;
                }
>
>> +
>> +               if (err & RX_PKT_ERR) {
>> +                       dev_kfree_skb_any(skb);
>> +                       stats->rx_dropped++;
>> +                       stats->rx_errors++;
>> +                       continue;
>> +               }
>> +
>> +               stats->rx_packets++;
>> +               stats->rx_bytes += len;
>> +
>> +               skb_reserve(skb, NET_SKB_PAD + NET_IP_ALIGN);
>> +               skb_put(skb, len);
>> +               skb->protocol = eth_type_trans(skb, ndev);
>> +               napi_gro_receive(&priv->napi, skb);
>> +
>> +               buf = netdev_alloc_frag(priv->rx_buf_size);
>> +               if (!buf)
>> +                       return -ENOMEM;
>> +               phys = dma_map_single(&ndev->dev, buf,
>> +                               RX_BUF_SIZE, DMA_FROM_DEVICE);
>
> Missing dma_mapping_error() check here.
Yes, thanks

>
>> +               priv->rx_buf[priv->rx_head] = buf;
>> +               priv->rx_phys[priv->rx_head] = phys;
>> +               hip04_set_recv_desc(priv, phys);
>> +
>> +               priv->rx_head = RX_NEXT(priv->rx_head);
>> +               if (rx++ >= budget)
>> +                       break;
>> +
>> +               if (--cnt == 0)
>> +                       cnt = hip04_recv_cnt(priv);
>
>> +       }
>> +
>> +       if (rx < budget) {
>> +               napi_complete(napi);
>> +
>> +               /* enable rx interrupt */
>> +               priv->reg_inten |= RCV_INT | RCV_NOBUF;
>> +               writel_relaxed(priv->reg_inten, priv->base + PPE_INTEN);
>> +       }
>> +
>> +       return rx;
>> +}
>> +
>> +static irqreturn_t hip04_mac_interrupt(int irq, void *dev_id)
>> +{
>> +       struct net_device *ndev = (struct net_device *) dev_id;
>> +       struct hip04_priv *priv = netdev_priv(ndev);
>> +       u32 ists = readl_relaxed(priv->base + PPE_INTSTS);
>> +       u32 val = DEF_INT_MASK;
>> +
>> +       writel_relaxed(val, priv->base + PPE_RINT);
>> +
>> +       if (ists & (RCV_INT | RCV_NOBUF)) {
>> +               if (napi_schedule_prep(&priv->napi)) {
>> +                       /* disable rx interrupt */
>> +                       priv->reg_inten &= ~(RCV_INT | RCV_NOBUF);
>> +                       writel_relaxed(priv->reg_inten, priv->base + PPE_INTEN);
>> +                       __napi_schedule(&priv->napi);
>> +               }
>> +       }
>
> You should also process TX completion interrupts here
There is no such interrupt.
>
>> +
>> +       return IRQ_HANDLED;
>> +}
>> +
>> +static void hip04_tx_reclaim(struct net_device *ndev, bool force)
>> +{
>> +       struct hip04_priv *priv = netdev_priv(ndev);
>> +       unsigned tx_head = priv->tx_head;
>> +       unsigned tx_tail = priv->tx_tail;
>> +       struct tx_desc *desc = &priv->tx_desc[priv->tx_tail];
>> +
>> +       while (tx_tail != tx_head) {
>> +               if (desc->send_addr != 0) {
>> +                       if (force)
>> +                               desc->send_addr = 0;
>> +                       else
>> +                               break;
>> +               }
>> +               if (priv->tx_phys[tx_tail]) {
>> +                       dma_unmap_single(&ndev->dev, priv->tx_phys[tx_tail],
>> +                               priv->tx_skb[tx_tail]->len, DMA_TO_DEVICE);
>> +                       priv->tx_phys[tx_tail] = 0;
>> +               }
>> +               dev_kfree_skb_irq(priv->tx_skb[tx_tail]);
>
> dev_kfree_skb_irq() bypasses all sort of SKB tracking, you might want
> to use kfree_skb() here instead.
OK, will use dev_kfree_skb instead.

>
>> +               priv->tx_skb[tx_tail] = NULL;
>> +               tx_tail = TX_NEXT(tx_tail);
>> +               priv->tx_count--;
>> +       }
>> +       priv->tx_tail = tx_tail;
>> +}
>> +
>> +static int hip04_mac_start_xmit(struct sk_buff *skb, struct net_device *ndev)
>> +{
>> +       struct hip04_priv *priv = netdev_priv(ndev);
>> +       struct net_device_stats *stats = &ndev->stats;
>> +       unsigned int tx_head = priv->tx_head;
>> +       struct tx_desc *desc = &priv->tx_desc[tx_head];
>> +       dma_addr_t phys;
>> +
>> +       hip04_tx_reclaim(ndev, false);
>> +
>> +       if (priv->tx_count++ >= TX_DESC_NUM) {
>> +               net_dbg_ratelimited("no TX space for packet\n");
>> +               netif_stop_queue(ndev);
>> +               return NETDEV_TX_BUSY;
>> +       }
>> +
>> +       phys = dma_map_single(&ndev->dev, skb->data, skb->len, DMA_TO_DEVICE);
>
> Missing dma_mapping_error() check here
>
>> +       priv->tx_skb[tx_head] = skb;
>> +       priv->tx_phys[tx_head] = phys;
>> +       desc->send_addr = cpu_to_be32(phys);
>> +       desc->send_size = cpu_to_be16(skb->len);
>> +       desc->cfg = cpu_to_be32(DESC_DEF_CFG);
>> +       phys = priv->tx_desc_dma + tx_head * sizeof(struct tx_desc);
>> +       desc->wb_addr = cpu_to_be32(phys);
>
> Don't we need a barrier here to ensure that all stores are completed
> before we hand this descriptor address to hip40_set_xmit_desc() which
> should make DMA start processing it?
>
>> +       skb_tx_timestamp(skb);
>> +       hip04_set_xmit_desc(priv, phys);
>> +       priv->tx_head = TX_NEXT(tx_head);
>> +
>> +       stats->tx_bytes += skb->len;
>> +       stats->tx_packets++;
>
> You cannot update the transmit stats here, what start_xmit() does it
> just queue packets for the DMA engine to process them, but that does
> not mean DMA has completed those. You should update statistics in the
> tx_reclaim() function.
Yes, however, since no TX complete interrupt, tx_reclaim may be called
rather late, it may be more suitable to put here.

Thanks

^ permalink raw reply	[flat|nested] 148+ messages in thread

* RE: [PATCH 3/3] net: hisilicon: new hip04 ethernet driver
  2014-03-25 17:57                   ` Arnd Bergmann
@ 2014-03-26  9:55                     ` David Laight
  -1 siblings, 0 replies; 148+ messages in thread
From: David Laight @ 2014-03-26  9:55 UTC (permalink / raw)
  To: 'Arnd Bergmann', Florian Fainelli
  Cc: Zhangfei Gao, linux-arm-kernel, Mark Rutland, devicetree,
	Russell King - ARM Linux, Sergei Shtylyov, netdev, Zhangfei Gao,
	David S. Miller

From: Arnd Bergmann
> On Tuesday 25 March 2014 10:16:28 Florian Fainelli wrote:
> >
> > Ok, well that's really unfortunate, to achieve the best of everything,
> > the workaround should probably look like:
> >
> > - keep reclaiming TX buffers in ndo_start_xmit() in case you push more
> > packets to the NICs than your timer can free
> > - reclaim TX buffers in NAPI poll() context for "symetrical" workloads
> > where e.g: TCP ACKs received allow you to complete TX buffers
> > - have a timer like you suggest which should help with transmit only
> > workloads at a slow rate
> 
> Yes, that is what I was thinking, but with orphaning the tx skbs,
> we can probably be a little smarter. Note that in order to check
> the state of the queue, we have to do a read from uncached memory,
> since the hardware also doesn't support cache coherent DMA.
> We don't want to do that too often.

Possibly you can check for all the pending transmits having
completed - the most likely case. Instead of checking them
individually?

You should limit the number of tx bytes as well as the number
of tx frames - buffering a ring full of large frames subverts
some of the algorithms higher up the stack.

	David

^ permalink raw reply	[flat|nested] 148+ messages in thread

* [PATCH 3/3] net: hisilicon: new hip04 ethernet driver
@ 2014-03-26  9:55                     ` David Laight
  0 siblings, 0 replies; 148+ messages in thread
From: David Laight @ 2014-03-26  9:55 UTC (permalink / raw)
  To: linux-arm-kernel

From: Arnd Bergmann
> On Tuesday 25 March 2014 10:16:28 Florian Fainelli wrote:
> >
> > Ok, well that's really unfortunate, to achieve the best of everything,
> > the workaround should probably look like:
> >
> > - keep reclaiming TX buffers in ndo_start_xmit() in case you push more
> > packets to the NICs than your timer can free
> > - reclaim TX buffers in NAPI poll() context for "symetrical" workloads
> > where e.g: TCP ACKs received allow you to complete TX buffers
> > - have a timer like you suggest which should help with transmit only
> > workloads at a slow rate
> 
> Yes, that is what I was thinking, but with orphaning the tx skbs,
> we can probably be a little smarter. Note that in order to check
> the state of the queue, we have to do a read from uncached memory,
> since the hardware also doesn't support cache coherent DMA.
> We don't want to do that too often.

Possibly you can check for all the pending transmits having
completed - the most likely case. Instead of checking them
individually?

You should limit the number of tx bytes as well as the number
of tx frames - buffering a ring full of large frames subverts
some of the algorithms higher up the stack.

	David

^ permalink raw reply	[flat|nested] 148+ messages in thread

* Re: [PATCH 3/3] net: hisilicon: new hip04 ethernet driver
  2014-03-25 17:16                 ` Florian Fainelli
@ 2014-03-25 17:57                   ` Arnd Bergmann
  -1 siblings, 0 replies; 148+ messages in thread
From: Arnd Bergmann @ 2014-03-25 17:57 UTC (permalink / raw)
  To: Florian Fainelli
  Cc: Mark Rutland, devicetree, Russell King - ARM Linux,
	Sergei Shtylyov, netdev, Zhangfei Gao, Zhangfei Gao,
	David S. Miller, linux-arm-kernel

On Tuesday 25 March 2014 10:16:28 Florian Fainelli wrote:
> 
> Ok, well that's really unfortunate, to achieve the best of everything,
> the workaround should probably look like:
> 
> - keep reclaiming TX buffers in ndo_start_xmit() in case you push more
> packets to the NICs than your timer can free
> - reclaim TX buffers in NAPI poll() context for "symetrical" workloads
> where e.g: TCP ACKs received allow you to complete TX buffers
> - have a timer like you suggest which should help with transmit only
> workloads at a slow rate

Yes, that is what I was thinking, but with orphaning the tx skbs,
we can probably be a little smarter. Note that in order to check
the state of the queue, we have to do a read from uncached memory,
since the hardware also doesn't support cache coherent DMA.
We don't want to do that too often.

	Arnd

^ permalink raw reply	[flat|nested] 148+ messages in thread

* [PATCH 3/3] net: hisilicon: new hip04 ethernet driver
@ 2014-03-25 17:57                   ` Arnd Bergmann
  0 siblings, 0 replies; 148+ messages in thread
From: Arnd Bergmann @ 2014-03-25 17:57 UTC (permalink / raw)
  To: linux-arm-kernel

On Tuesday 25 March 2014 10:16:28 Florian Fainelli wrote:
> 
> Ok, well that's really unfortunate, to achieve the best of everything,
> the workaround should probably look like:
> 
> - keep reclaiming TX buffers in ndo_start_xmit() in case you push more
> packets to the NICs than your timer can free
> - reclaim TX buffers in NAPI poll() context for "symetrical" workloads
> where e.g: TCP ACKs received allow you to complete TX buffers
> - have a timer like you suggest which should help with transmit only
> workloads at a slow rate

Yes, that is what I was thinking, but with orphaning the tx skbs,
we can probably be a little smarter. Note that in order to check
the state of the queue, we have to do a read from uncached memory,
since the hardware also doesn't support cache coherent DMA.
We don't want to do that too often.

	Arnd

^ permalink raw reply	[flat|nested] 148+ messages in thread

* Re: [PATCH 3/3] net: hisilicon: new hip04 ethernet driver
  2014-03-25 17:21                 ` Eric Dumazet
@ 2014-03-25 17:54                   ` Arnd Bergmann
  -1 siblings, 0 replies; 148+ messages in thread
From: Arnd Bergmann @ 2014-03-25 17:54 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Florian Fainelli, Zhangfei Gao, linux-arm-kernel, Mark Rutland,
	devicetree, Russell King - ARM Linux, Sergei Shtylyov, netdev,
	Zhangfei Gao, David S. Miller

On Tuesday 25 March 2014 10:21:42 Eric Dumazet wrote:
> On Tue, 2014-03-25 at 18:05 +0100, Arnd Bergmann wrote:
> > On Tuesday 25 March 2014 10:00:30 Florian Fainelli wrote:
> >
> > > Using a timer to ensure completion of TX packets is a trick that
> > > worked in the past, but now that the networking stack got smarter,
> > > this might artificially increase the processing time of packets in the
> > > transmit path, and this will defeat features like TCP small queues
> > > etc.. as could be seen with the mvneta driver [1]. The best way really
> > > is to rely on TX completion interrupts when those exist as they cannot
> > > lie about the hardware status (in theory) and they should provide the
> > > fastest way to complete TX packets.
> > 
> > By as Zhangfei Gao pointed out, this hardware does not have a working
> > TX completion interrupt. Using timers to do this has always just been
> > a workaround for broken hardware IMHO.
> 
> For this kind of drivers, calling skb_orphan() from ndo_start_xmit() is
> mandatory.

Cool, thanks for the information, I was wondering already if there was
a way to deal with hardware like this.

	Arnd

^ permalink raw reply	[flat|nested] 148+ messages in thread

* [PATCH 3/3] net: hisilicon: new hip04 ethernet driver
@ 2014-03-25 17:54                   ` Arnd Bergmann
  0 siblings, 0 replies; 148+ messages in thread
From: Arnd Bergmann @ 2014-03-25 17:54 UTC (permalink / raw)
  To: linux-arm-kernel

On Tuesday 25 March 2014 10:21:42 Eric Dumazet wrote:
> On Tue, 2014-03-25 at 18:05 +0100, Arnd Bergmann wrote:
> > On Tuesday 25 March 2014 10:00:30 Florian Fainelli wrote:
> >
> > > Using a timer to ensure completion of TX packets is a trick that
> > > worked in the past, but now that the networking stack got smarter,
> > > this might artificially increase the processing time of packets in the
> > > transmit path, and this will defeat features like TCP small queues
> > > etc.. as could be seen with the mvneta driver [1]. The best way really
> > > is to rely on TX completion interrupts when those exist as they cannot
> > > lie about the hardware status (in theory) and they should provide the
> > > fastest way to complete TX packets.
> > 
> > By as Zhangfei Gao pointed out, this hardware does not have a working
> > TX completion interrupt. Using timers to do this has always just been
> > a workaround for broken hardware IMHO.
> 
> For this kind of drivers, calling skb_orphan() from ndo_start_xmit() is
> mandatory.

Cool, thanks for the information, I was wondering already if there was
a way to deal with hardware like this.

	Arnd

^ permalink raw reply	[flat|nested] 148+ messages in thread

* Re: [PATCH 3/3] net: hisilicon: new hip04 ethernet driver
  2014-03-25 17:05               ` Arnd Bergmann
@ 2014-03-25 17:21                 ` Eric Dumazet
  -1 siblings, 0 replies; 148+ messages in thread
From: Eric Dumazet @ 2014-03-25 17:21 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Mark Rutland, devicetree, Florian Fainelli,
	Russell King - ARM Linux, Sergei Shtylyov, netdev, Zhangfei Gao,
	Zhangfei Gao, David S. Miller, linux-arm-kernel

On Tue, 2014-03-25 at 18:05 +0100, Arnd Bergmann wrote:
> On Tuesday 25 March 2014 10:00:30 Florian Fainelli wrote:
> > 2014-03-25 1:12 GMT-07:00 Arnd Bergmann <arnd@arndb.de>:
> > > On Tuesday 25 March 2014 12:06:31 Zhangfei Gao wrote:
> > >> Dear Arnd
> > >>
> > >> On Mon, Mar 24, 2014 at 11:18 PM, Arnd Bergmann <arnd@arndb.de> wrote:
> > >> > On Monday 24 March 2014 22:14:56 Zhangfei Gao wrote:
> > >> >
...
> > >> > I think you still need to find a solution to ensure that the tx reclaim is
> > >> > called eventually through a method other than start_xmit.
> > >>
> > >> In the iperf stress test, if move reclaim to poll, there is some
> > >> error, sometimes sending zero packets.
> > >> While keep reclaim in the xmit to reclaim transmitted packets looks
> > >> stable in the test,
> > >> There TX_DESC_NUM desc can be used.
> > >
> > > What I meant is that you need a correct implementation, presumably
> > > you added a bug when you moved the function to poll(), and also you
> > > forgot to add a timer.
> > 
> > Using a timer to ensure completion of TX packets is a trick that
> > worked in the past, but now that the networking stack got smarter,
> > this might artificially increase the processing time of packets in the
> > transmit path, and this will defeat features like TCP small queues
> > etc.. as could be seen with the mvneta driver [1]. The best way really
> > is to rely on TX completion interrupts when those exist as they cannot
> > lie about the hardware status (in theory) and they should provide the
> > fastest way to complete TX packets.
> 
> By as Zhangfei Gao pointed out, this hardware does not have a working
> TX completion interrupt. Using timers to do this has always just been
> a workaround for broken hardware IMHO.

For this kind of drivers, calling skb_orphan() from ndo_start_xmit() is
mandatory.

^ permalink raw reply	[flat|nested] 148+ messages in thread

* [PATCH 3/3] net: hisilicon: new hip04 ethernet driver
@ 2014-03-25 17:21                 ` Eric Dumazet
  0 siblings, 0 replies; 148+ messages in thread
From: Eric Dumazet @ 2014-03-25 17:21 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, 2014-03-25 at 18:05 +0100, Arnd Bergmann wrote:
> On Tuesday 25 March 2014 10:00:30 Florian Fainelli wrote:
> > 2014-03-25 1:12 GMT-07:00 Arnd Bergmann <arnd@arndb.de>:
> > > On Tuesday 25 March 2014 12:06:31 Zhangfei Gao wrote:
> > >> Dear Arnd
> > >>
> > >> On Mon, Mar 24, 2014 at 11:18 PM, Arnd Bergmann <arnd@arndb.de> wrote:
> > >> > On Monday 24 March 2014 22:14:56 Zhangfei Gao wrote:
> > >> >
...
> > >> > I think you still need to find a solution to ensure that the tx reclaim is
> > >> > called eventually through a method other than start_xmit.
> > >>
> > >> In the iperf stress test, if move reclaim to poll, there is some
> > >> error, sometimes sending zero packets.
> > >> While keep reclaim in the xmit to reclaim transmitted packets looks
> > >> stable in the test,
> > >> There TX_DESC_NUM desc can be used.
> > >
> > > What I meant is that you need a correct implementation, presumably
> > > you added a bug when you moved the function to poll(), and also you
> > > forgot to add a timer.
> > 
> > Using a timer to ensure completion of TX packets is a trick that
> > worked in the past, but now that the networking stack got smarter,
> > this might artificially increase the processing time of packets in the
> > transmit path, and this will defeat features like TCP small queues
> > etc.. as could be seen with the mvneta driver [1]. The best way really
> > is to rely on TX completion interrupts when those exist as they cannot
> > lie about the hardware status (in theory) and they should provide the
> > fastest way to complete TX packets.
> 
> By as Zhangfei Gao pointed out, this hardware does not have a working
> TX completion interrupt. Using timers to do this has always just been
> a workaround for broken hardware IMHO.

For this kind of drivers, calling skb_orphan() from ndo_start_xmit() is
mandatory.

^ permalink raw reply	[flat|nested] 148+ messages in thread

* RE: [PATCH 3/3] net: hisilicon: new hip04 ethernet driver
  2014-03-25 17:05               ` Arnd Bergmann
@ 2014-03-25 17:17                 ` David Laight
  -1 siblings, 0 replies; 148+ messages in thread
From: David Laight @ 2014-03-25 17:17 UTC (permalink / raw)
  To: 'Arnd Bergmann', Florian Fainelli
  Cc: Zhangfei Gao, linux-arm-kernel, Mark Rutland, devicetree,
	Russell King - ARM Linux, Sergei Shtylyov, netdev, Zhangfei Gao,
	David S. Miller

From: Arnd Bergmann
> > Using a timer to ensure completion of TX packets is a trick that
> > worked in the past, but now that the networking stack got smarter,
> > this might artificially increase the processing time of packets in the
> > transmit path, and this will defeat features like TCP small queues
> > etc.. as could be seen with the mvneta driver [1]. The best way really
> > is to rely on TX completion interrupts when those exist as they cannot
> > lie about the hardware status (in theory) and they should provide the
> > fastest way to complete TX packets.
> 
> By as Zhangfei Gao pointed out, this hardware does not have a working
> TX completion interrupt. Using timers to do this has always just been
> a workaround for broken hardware IMHO.

I remember disabling the 'tx done' interrupt (unless the tx ring
was full) in order to get a significant increase in throughput
due to the reduced interrupt load.
The 'interrupt mitigation' logic on modern hardware probably makes
this less of a problem.

It might be possible to orphan the skb when they are put into the
tx ring, and to significantly limit the number of bytes in the
tx ring (BQL?).
That might upset TCP small queues less than delaying the actual
tx completions.

	David

^ permalink raw reply	[flat|nested] 148+ messages in thread

* [PATCH 3/3] net: hisilicon: new hip04 ethernet driver
@ 2014-03-25 17:17                 ` David Laight
  0 siblings, 0 replies; 148+ messages in thread
From: David Laight @ 2014-03-25 17:17 UTC (permalink / raw)
  To: linux-arm-kernel

From: Arnd Bergmann
> > Using a timer to ensure completion of TX packets is a trick that
> > worked in the past, but now that the networking stack got smarter,
> > this might artificially increase the processing time of packets in the
> > transmit path, and this will defeat features like TCP small queues
> > etc.. as could be seen with the mvneta driver [1]. The best way really
> > is to rely on TX completion interrupts when those exist as they cannot
> > lie about the hardware status (in theory) and they should provide the
> > fastest way to complete TX packets.
> 
> By as Zhangfei Gao pointed out, this hardware does not have a working
> TX completion interrupt. Using timers to do this has always just been
> a workaround for broken hardware IMHO.

I remember disabling the 'tx done' interrupt (unless the tx ring
was full) in order to get a significant increase in throughput
due to the reduced interrupt load.
The 'interrupt mitigation' logic on modern hardware probably makes
this less of a problem.

It might be possible to orphan the skb when they are put into the
tx ring, and to significantly limit the number of bytes in the
tx ring (BQL?).
That might upset TCP small queues less than delaying the actual
tx completions.

	David

^ permalink raw reply	[flat|nested] 148+ messages in thread

* Re: [PATCH 3/3] net: hisilicon: new hip04 ethernet driver
  2014-03-25 17:05               ` Arnd Bergmann
@ 2014-03-25 17:16                 ` Florian Fainelli
  -1 siblings, 0 replies; 148+ messages in thread
From: Florian Fainelli @ 2014-03-25 17:16 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Zhangfei Gao, linux-arm-kernel, Mark Rutland, devicetree,
	Russell King - ARM Linux, Sergei Shtylyov, netdev, Zhangfei Gao,
	David S. Miller

2014-03-25 10:05 GMT-07:00 Arnd Bergmann <arnd@arndb.de>:
> On Tuesday 25 March 2014 10:00:30 Florian Fainelli wrote:
>> 2014-03-25 1:12 GMT-07:00 Arnd Bergmann <arnd@arndb.de>:
>> > On Tuesday 25 March 2014 12:06:31 Zhangfei Gao wrote:
>> >> Dear Arnd
>> >>
>> >> On Mon, Mar 24, 2014 at 11:18 PM, Arnd Bergmann <arnd@arndb.de> wrote:
>> >> > On Monday 24 March 2014 22:14:56 Zhangfei Gao wrote:
>> >> >
>> >> >> +
>> >> >> +static void hip04_tx_reclaim(struct net_device *ndev, bool force)
>> >> >> +{
>> >> >> +     struct hip04_priv *priv = netdev_priv(ndev);
>> >> >> +     unsigned tx_head = priv->tx_head;
>> >> >> +     unsigned tx_tail = priv->tx_tail;
>> >> >> +     struct tx_desc *desc = &priv->tx_desc[priv->tx_tail];
>> >> >> +
>> >> >> +     while (tx_tail != tx_head) {
>> >> >> +             if (desc->send_addr != 0) {
>> >> >> +                     if (force)
>> >> >> +                             desc->send_addr = 0;
>> >> >> +                     else
>> >> >> +                             break;
>> >> >> +             }
>> >> >> +             if (priv->tx_phys[tx_tail]) {
>> >> >> +                     dma_unmap_single(&ndev->dev, priv->tx_phys[tx_tail],
>> >> >> +                             priv->tx_skb[tx_tail]->len, DMA_TO_DEVICE);
>> >> >> +                     priv->tx_phys[tx_tail] = 0;
>> >> >> +             }
>> >> >> +             dev_kfree_skb_irq(priv->tx_skb[tx_tail]);
>> >> >> +             priv->tx_skb[tx_tail] = NULL;
>> >> >> +             tx_tail = TX_NEXT(tx_tail);
>> >> >> +             priv->tx_count--;
>> >> >> +     }
>> >> >> +     priv->tx_tail = tx_tail;
>> >> >> +}
>> >> >
>> >> > I think you still need to find a solution to ensure that the tx reclaim is
>> >> > called eventually through a method other than start_xmit.
>> >>
>> >> In the iperf stress test, if move reclaim to poll, there is some
>> >> error, sometimes sending zero packets.
>> >> While keep reclaim in the xmit to reclaim transmitted packets looks
>> >> stable in the test,
>> >> There TX_DESC_NUM desc can be used.
>> >
>> > What I meant is that you need a correct implementation, presumably
>> > you added a bug when you moved the function to poll(), and also you
>> > forgot to add a timer.
>>
>> Using a timer to ensure completion of TX packets is a trick that
>> worked in the past, but now that the networking stack got smarter,
>> this might artificially increase the processing time of packets in the
>> transmit path, and this will defeat features like TCP small queues
>> etc.. as could be seen with the mvneta driver [1]. The best way really
>> is to rely on TX completion interrupts when those exist as they cannot
>> lie about the hardware status (in theory) and they should provide the
>> fastest way to complete TX packets.
>
> By as Zhangfei Gao pointed out, this hardware does not have a working
> TX completion interrupt. Using timers to do this has always just been
> a workaround for broken hardware IMHO.

Ok, well that's really unfortunate, to achieve the best of everything,
the workaround should probably look like:

- keep reclaiming TX buffers in ndo_start_xmit() in case you push more
packets to the NICs than your timer can free
- reclaim TX buffers in NAPI poll() context for "symetrical" workloads
where e.g: TCP ACKs received allow you to complete TX buffers
- have a timer like you suggest which should help with transmit only
workloads at a slow rate
-- 
Florian

^ permalink raw reply	[flat|nested] 148+ messages in thread

* [PATCH 3/3] net: hisilicon: new hip04 ethernet driver
@ 2014-03-25 17:16                 ` Florian Fainelli
  0 siblings, 0 replies; 148+ messages in thread
From: Florian Fainelli @ 2014-03-25 17:16 UTC (permalink / raw)
  To: linux-arm-kernel

2014-03-25 10:05 GMT-07:00 Arnd Bergmann <arnd@arndb.de>:
> On Tuesday 25 March 2014 10:00:30 Florian Fainelli wrote:
>> 2014-03-25 1:12 GMT-07:00 Arnd Bergmann <arnd@arndb.de>:
>> > On Tuesday 25 March 2014 12:06:31 Zhangfei Gao wrote:
>> >> Dear Arnd
>> >>
>> >> On Mon, Mar 24, 2014 at 11:18 PM, Arnd Bergmann <arnd@arndb.de> wrote:
>> >> > On Monday 24 March 2014 22:14:56 Zhangfei Gao wrote:
>> >> >
>> >> >> +
>> >> >> +static void hip04_tx_reclaim(struct net_device *ndev, bool force)
>> >> >> +{
>> >> >> +     struct hip04_priv *priv = netdev_priv(ndev);
>> >> >> +     unsigned tx_head = priv->tx_head;
>> >> >> +     unsigned tx_tail = priv->tx_tail;
>> >> >> +     struct tx_desc *desc = &priv->tx_desc[priv->tx_tail];
>> >> >> +
>> >> >> +     while (tx_tail != tx_head) {
>> >> >> +             if (desc->send_addr != 0) {
>> >> >> +                     if (force)
>> >> >> +                             desc->send_addr = 0;
>> >> >> +                     else
>> >> >> +                             break;
>> >> >> +             }
>> >> >> +             if (priv->tx_phys[tx_tail]) {
>> >> >> +                     dma_unmap_single(&ndev->dev, priv->tx_phys[tx_tail],
>> >> >> +                             priv->tx_skb[tx_tail]->len, DMA_TO_DEVICE);
>> >> >> +                     priv->tx_phys[tx_tail] = 0;
>> >> >> +             }
>> >> >> +             dev_kfree_skb_irq(priv->tx_skb[tx_tail]);
>> >> >> +             priv->tx_skb[tx_tail] = NULL;
>> >> >> +             tx_tail = TX_NEXT(tx_tail);
>> >> >> +             priv->tx_count--;
>> >> >> +     }
>> >> >> +     priv->tx_tail = tx_tail;
>> >> >> +}
>> >> >
>> >> > I think you still need to find a solution to ensure that the tx reclaim is
>> >> > called eventually through a method other than start_xmit.
>> >>
>> >> In the iperf stress test, if move reclaim to poll, there is some
>> >> error, sometimes sending zero packets.
>> >> While keep reclaim in the xmit to reclaim transmitted packets looks
>> >> stable in the test,
>> >> There TX_DESC_NUM desc can be used.
>> >
>> > What I meant is that you need a correct implementation, presumably
>> > you added a bug when you moved the function to poll(), and also you
>> > forgot to add a timer.
>>
>> Using a timer to ensure completion of TX packets is a trick that
>> worked in the past, but now that the networking stack got smarter,
>> this might artificially increase the processing time of packets in the
>> transmit path, and this will defeat features like TCP small queues
>> etc.. as could be seen with the mvneta driver [1]. The best way really
>> is to rely on TX completion interrupts when those exist as they cannot
>> lie about the hardware status (in theory) and they should provide the
>> fastest way to complete TX packets.
>
> By as Zhangfei Gao pointed out, this hardware does not have a working
> TX completion interrupt. Using timers to do this has always just been
> a workaround for broken hardware IMHO.

Ok, well that's really unfortunate, to achieve the best of everything,
the workaround should probably look like:

- keep reclaiming TX buffers in ndo_start_xmit() in case you push more
packets to the NICs than your timer can free
- reclaim TX buffers in NAPI poll() context for "symetrical" workloads
where e.g: TCP ACKs received allow you to complete TX buffers
- have a timer like you suggest which should help with transmit only
workloads at a slow rate
-- 
Florian

^ permalink raw reply	[flat|nested] 148+ messages in thread

* Re: [PATCH 3/3] net: hisilicon: new hip04 ethernet driver
  2014-03-25 17:00             ` Florian Fainelli
@ 2014-03-25 17:05               ` Arnd Bergmann
  -1 siblings, 0 replies; 148+ messages in thread
From: Arnd Bergmann @ 2014-03-25 17:05 UTC (permalink / raw)
  To: Florian Fainelli
  Cc: Zhangfei Gao, linux-arm-kernel, Mark Rutland, devicetree,
	Russell King - ARM Linux, Sergei Shtylyov, netdev, Zhangfei Gao,
	David S. Miller

On Tuesday 25 March 2014 10:00:30 Florian Fainelli wrote:
> 2014-03-25 1:12 GMT-07:00 Arnd Bergmann <arnd@arndb.de>:
> > On Tuesday 25 March 2014 12:06:31 Zhangfei Gao wrote:
> >> Dear Arnd
> >>
> >> On Mon, Mar 24, 2014 at 11:18 PM, Arnd Bergmann <arnd@arndb.de> wrote:
> >> > On Monday 24 March 2014 22:14:56 Zhangfei Gao wrote:
> >> >
> >> >> +
> >> >> +static void hip04_tx_reclaim(struct net_device *ndev, bool force)
> >> >> +{
> >> >> +     struct hip04_priv *priv = netdev_priv(ndev);
> >> >> +     unsigned tx_head = priv->tx_head;
> >> >> +     unsigned tx_tail = priv->tx_tail;
> >> >> +     struct tx_desc *desc = &priv->tx_desc[priv->tx_tail];
> >> >> +
> >> >> +     while (tx_tail != tx_head) {
> >> >> +             if (desc->send_addr != 0) {
> >> >> +                     if (force)
> >> >> +                             desc->send_addr = 0;
> >> >> +                     else
> >> >> +                             break;
> >> >> +             }
> >> >> +             if (priv->tx_phys[tx_tail]) {
> >> >> +                     dma_unmap_single(&ndev->dev, priv->tx_phys[tx_tail],
> >> >> +                             priv->tx_skb[tx_tail]->len, DMA_TO_DEVICE);
> >> >> +                     priv->tx_phys[tx_tail] = 0;
> >> >> +             }
> >> >> +             dev_kfree_skb_irq(priv->tx_skb[tx_tail]);
> >> >> +             priv->tx_skb[tx_tail] = NULL;
> >> >> +             tx_tail = TX_NEXT(tx_tail);
> >> >> +             priv->tx_count--;
> >> >> +     }
> >> >> +     priv->tx_tail = tx_tail;
> >> >> +}
> >> >
> >> > I think you still need to find a solution to ensure that the tx reclaim is
> >> > called eventually through a method other than start_xmit.
> >>
> >> In the iperf stress test, if move reclaim to poll, there is some
> >> error, sometimes sending zero packets.
> >> While keep reclaim in the xmit to reclaim transmitted packets looks
> >> stable in the test,
> >> There TX_DESC_NUM desc can be used.
> >
> > What I meant is that you need a correct implementation, presumably
> > you added a bug when you moved the function to poll(), and also you
> > forgot to add a timer.
> 
> Using a timer to ensure completion of TX packets is a trick that
> worked in the past, but now that the networking stack got smarter,
> this might artificially increase the processing time of packets in the
> transmit path, and this will defeat features like TCP small queues
> etc.. as could be seen with the mvneta driver [1]. The best way really
> is to rely on TX completion interrupts when those exist as they cannot
> lie about the hardware status (in theory) and they should provide the
> fastest way to complete TX packets.

By as Zhangfei Gao pointed out, this hardware does not have a working
TX completion interrupt. Using timers to do this has always just been
a workaround for broken hardware IMHO.

	Arnd

^ permalink raw reply	[flat|nested] 148+ messages in thread

* [PATCH 3/3] net: hisilicon: new hip04 ethernet driver
@ 2014-03-25 17:05               ` Arnd Bergmann
  0 siblings, 0 replies; 148+ messages in thread
From: Arnd Bergmann @ 2014-03-25 17:05 UTC (permalink / raw)
  To: linux-arm-kernel

On Tuesday 25 March 2014 10:00:30 Florian Fainelli wrote:
> 2014-03-25 1:12 GMT-07:00 Arnd Bergmann <arnd@arndb.de>:
> > On Tuesday 25 March 2014 12:06:31 Zhangfei Gao wrote:
> >> Dear Arnd
> >>
> >> On Mon, Mar 24, 2014 at 11:18 PM, Arnd Bergmann <arnd@arndb.de> wrote:
> >> > On Monday 24 March 2014 22:14:56 Zhangfei Gao wrote:
> >> >
> >> >> +
> >> >> +static void hip04_tx_reclaim(struct net_device *ndev, bool force)
> >> >> +{
> >> >> +     struct hip04_priv *priv = netdev_priv(ndev);
> >> >> +     unsigned tx_head = priv->tx_head;
> >> >> +     unsigned tx_tail = priv->tx_tail;
> >> >> +     struct tx_desc *desc = &priv->tx_desc[priv->tx_tail];
> >> >> +
> >> >> +     while (tx_tail != tx_head) {
> >> >> +             if (desc->send_addr != 0) {
> >> >> +                     if (force)
> >> >> +                             desc->send_addr = 0;
> >> >> +                     else
> >> >> +                             break;
> >> >> +             }
> >> >> +             if (priv->tx_phys[tx_tail]) {
> >> >> +                     dma_unmap_single(&ndev->dev, priv->tx_phys[tx_tail],
> >> >> +                             priv->tx_skb[tx_tail]->len, DMA_TO_DEVICE);
> >> >> +                     priv->tx_phys[tx_tail] = 0;
> >> >> +             }
> >> >> +             dev_kfree_skb_irq(priv->tx_skb[tx_tail]);
> >> >> +             priv->tx_skb[tx_tail] = NULL;
> >> >> +             tx_tail = TX_NEXT(tx_tail);
> >> >> +             priv->tx_count--;
> >> >> +     }
> >> >> +     priv->tx_tail = tx_tail;
> >> >> +}
> >> >
> >> > I think you still need to find a solution to ensure that the tx reclaim is
> >> > called eventually through a method other than start_xmit.
> >>
> >> In the iperf stress test, if move reclaim to poll, there is some
> >> error, sometimes sending zero packets.
> >> While keep reclaim in the xmit to reclaim transmitted packets looks
> >> stable in the test,
> >> There TX_DESC_NUM desc can be used.
> >
> > What I meant is that you need a correct implementation, presumably
> > you added a bug when you moved the function to poll(), and also you
> > forgot to add a timer.
> 
> Using a timer to ensure completion of TX packets is a trick that
> worked in the past, but now that the networking stack got smarter,
> this might artificially increase the processing time of packets in the
> transmit path, and this will defeat features like TCP small queues
> etc.. as could be seen with the mvneta driver [1]. The best way really
> is to rely on TX completion interrupts when those exist as they cannot
> lie about the hardware status (in theory) and they should provide the
> fastest way to complete TX packets.

By as Zhangfei Gao pointed out, this hardware does not have a working
TX completion interrupt. Using timers to do this has always just been
a workaround for broken hardware IMHO.

	Arnd

^ permalink raw reply	[flat|nested] 148+ messages in thread

* Re: [PATCH 3/3] net: hisilicon: new hip04 ethernet driver
  2014-03-25  8:12           ` Arnd Bergmann
@ 2014-03-25 17:00             ` Florian Fainelli
  -1 siblings, 0 replies; 148+ messages in thread
From: Florian Fainelli @ 2014-03-25 17:00 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Zhangfei Gao, linux-arm-kernel, Mark Rutland, devicetree,
	Russell King - ARM Linux, Sergei Shtylyov, netdev, Zhangfei Gao,
	David S. Miller

2014-03-25 1:12 GMT-07:00 Arnd Bergmann <arnd@arndb.de>:
> On Tuesday 25 March 2014 12:06:31 Zhangfei Gao wrote:
>> Dear Arnd
>>
>> On Mon, Mar 24, 2014 at 11:18 PM, Arnd Bergmann <arnd@arndb.de> wrote:
>> > On Monday 24 March 2014 22:14:56 Zhangfei Gao wrote:
>> >
>> >> +
>> >> +static void hip04_tx_reclaim(struct net_device *ndev, bool force)
>> >> +{
>> >> +     struct hip04_priv *priv = netdev_priv(ndev);
>> >> +     unsigned tx_head = priv->tx_head;
>> >> +     unsigned tx_tail = priv->tx_tail;
>> >> +     struct tx_desc *desc = &priv->tx_desc[priv->tx_tail];
>> >> +
>> >> +     while (tx_tail != tx_head) {
>> >> +             if (desc->send_addr != 0) {
>> >> +                     if (force)
>> >> +                             desc->send_addr = 0;
>> >> +                     else
>> >> +                             break;
>> >> +             }
>> >> +             if (priv->tx_phys[tx_tail]) {
>> >> +                     dma_unmap_single(&ndev->dev, priv->tx_phys[tx_tail],
>> >> +                             priv->tx_skb[tx_tail]->len, DMA_TO_DEVICE);
>> >> +                     priv->tx_phys[tx_tail] = 0;
>> >> +             }
>> >> +             dev_kfree_skb_irq(priv->tx_skb[tx_tail]);
>> >> +             priv->tx_skb[tx_tail] = NULL;
>> >> +             tx_tail = TX_NEXT(tx_tail);
>> >> +             priv->tx_count--;
>> >> +     }
>> >> +     priv->tx_tail = tx_tail;
>> >> +}
>> >
>> > I think you still need to find a solution to ensure that the tx reclaim is
>> > called eventually through a method other than start_xmit.
>>
>> In the iperf stress test, if move reclaim to poll, there is some
>> error, sometimes sending zero packets.
>> While keep reclaim in the xmit to reclaim transmitted packets looks
>> stable in the test,
>> There TX_DESC_NUM desc can be used.
>
> What I meant is that you need a correct implementation, presumably
> you added a bug when you moved the function to poll(), and also you
> forgot to add a timer.

Using a timer to ensure completion of TX packets is a trick that
worked in the past, but now that the networking stack got smarter,
this might artificially increase the processing time of packets in the
transmit path, and this will defeat features like TCP small queues
etc.. as could be seen with the mvneta driver [1]. The best way really
is to rely on TX completion interrupts when those exist as they cannot
lie about the hardware status (in theory) and they should provide the
fastest way to complete TX packets.
-- 
Florian

^ permalink raw reply	[flat|nested] 148+ messages in thread

* [PATCH 3/3] net: hisilicon: new hip04 ethernet driver
@ 2014-03-25 17:00             ` Florian Fainelli
  0 siblings, 0 replies; 148+ messages in thread
From: Florian Fainelli @ 2014-03-25 17:00 UTC (permalink / raw)
  To: linux-arm-kernel

2014-03-25 1:12 GMT-07:00 Arnd Bergmann <arnd@arndb.de>:
> On Tuesday 25 March 2014 12:06:31 Zhangfei Gao wrote:
>> Dear Arnd
>>
>> On Mon, Mar 24, 2014 at 11:18 PM, Arnd Bergmann <arnd@arndb.de> wrote:
>> > On Monday 24 March 2014 22:14:56 Zhangfei Gao wrote:
>> >
>> >> +
>> >> +static void hip04_tx_reclaim(struct net_device *ndev, bool force)
>> >> +{
>> >> +     struct hip04_priv *priv = netdev_priv(ndev);
>> >> +     unsigned tx_head = priv->tx_head;
>> >> +     unsigned tx_tail = priv->tx_tail;
>> >> +     struct tx_desc *desc = &priv->tx_desc[priv->tx_tail];
>> >> +
>> >> +     while (tx_tail != tx_head) {
>> >> +             if (desc->send_addr != 0) {
>> >> +                     if (force)
>> >> +                             desc->send_addr = 0;
>> >> +                     else
>> >> +                             break;
>> >> +             }
>> >> +             if (priv->tx_phys[tx_tail]) {
>> >> +                     dma_unmap_single(&ndev->dev, priv->tx_phys[tx_tail],
>> >> +                             priv->tx_skb[tx_tail]->len, DMA_TO_DEVICE);
>> >> +                     priv->tx_phys[tx_tail] = 0;
>> >> +             }
>> >> +             dev_kfree_skb_irq(priv->tx_skb[tx_tail]);
>> >> +             priv->tx_skb[tx_tail] = NULL;
>> >> +             tx_tail = TX_NEXT(tx_tail);
>> >> +             priv->tx_count--;
>> >> +     }
>> >> +     priv->tx_tail = tx_tail;
>> >> +}
>> >
>> > I think you still need to find a solution to ensure that the tx reclaim is
>> > called eventually through a method other than start_xmit.
>>
>> In the iperf stress test, if move reclaim to poll, there is some
>> error, sometimes sending zero packets.
>> While keep reclaim in the xmit to reclaim transmitted packets looks
>> stable in the test,
>> There TX_DESC_NUM desc can be used.
>
> What I meant is that you need a correct implementation, presumably
> you added a bug when you moved the function to poll(), and also you
> forgot to add a timer.

Using a timer to ensure completion of TX packets is a trick that
worked in the past, but now that the networking stack got smarter,
this might artificially increase the processing time of packets in the
transmit path, and this will defeat features like TCP small queues
etc.. as could be seen with the mvneta driver [1]. The best way really
is to rely on TX completion interrupts when those exist as they cannot
lie about the hardware status (in theory) and they should provide the
fastest way to complete TX packets.
-- 
Florian

^ permalink raw reply	[flat|nested] 148+ messages in thread

* Re: [PATCH 3/3] net: hisilicon: new hip04 ethernet driver
  2014-03-25  4:06         ` Zhangfei Gao
@ 2014-03-25  8:12           ` Arnd Bergmann
  -1 siblings, 0 replies; 148+ messages in thread
From: Arnd Bergmann @ 2014-03-25  8:12 UTC (permalink / raw)
  To: Zhangfei Gao
  Cc: Mark Rutland, devicetree, Florian Fainelli,
	Russell King - ARM Linux, Sergei Shtylyov, netdev, Zhangfei Gao,
	David S. Miller, linux-arm-kernel

On Tuesday 25 March 2014 12:06:31 Zhangfei Gao wrote:
> Dear Arnd
> 
> On Mon, Mar 24, 2014 at 11:18 PM, Arnd Bergmann <arnd@arndb.de> wrote:
> > On Monday 24 March 2014 22:14:56 Zhangfei Gao wrote:
> >
> >> +
> >> +static void hip04_tx_reclaim(struct net_device *ndev, bool force)
> >> +{
> >> +     struct hip04_priv *priv = netdev_priv(ndev);
> >> +     unsigned tx_head = priv->tx_head;
> >> +     unsigned tx_tail = priv->tx_tail;
> >> +     struct tx_desc *desc = &priv->tx_desc[priv->tx_tail];
> >> +
> >> +     while (tx_tail != tx_head) {
> >> +             if (desc->send_addr != 0) {
> >> +                     if (force)
> >> +                             desc->send_addr = 0;
> >> +                     else
> >> +                             break;
> >> +             }
> >> +             if (priv->tx_phys[tx_tail]) {
> >> +                     dma_unmap_single(&ndev->dev, priv->tx_phys[tx_tail],
> >> +                             priv->tx_skb[tx_tail]->len, DMA_TO_DEVICE);
> >> +                     priv->tx_phys[tx_tail] = 0;
> >> +             }
> >> +             dev_kfree_skb_irq(priv->tx_skb[tx_tail]);
> >> +             priv->tx_skb[tx_tail] = NULL;
> >> +             tx_tail = TX_NEXT(tx_tail);
> >> +             priv->tx_count--;
> >> +     }
> >> +     priv->tx_tail = tx_tail;
> >> +}
> >
> > I think you still need to find a solution to ensure that the tx reclaim is
> > called eventually through a method other than start_xmit.
> 
> In the iperf stress test, if move reclaim to poll, there is some
> error, sometimes sending zero packets.
> While keep reclaim in the xmit to reclaim transmitted packets looks
> stable in the test,
> There TX_DESC_NUM desc can be used.

What I meant is that you need a correct implementation, presumably
you added a bug when you moved the function to poll(), and also you
forgot to add a timer.

> >> +     priv->map = syscon_regmap_lookup_by_compatible("hisilicon,hip04-ppe");
> >> +     if (IS_ERR(priv->map)) {
> >> +             dev_warn(d, "no hisilicon,hip04-ppe\n");
> >> +             ret = PTR_ERR(priv->map);
> >> +             goto init_fail;
> >> +     }
> >> +
> >> +     n = of_parse_phandle(node, "port-handle", 0);
> >> +     if (n) {
> >> +             ret = of_property_read_u32(n, "reg", &priv->port);
> >> +             if (ret) {
> >> +                     dev_warn(d, "no reg info\n");
> >> +                     goto init_fail;
> >> +             }
> >> +
> >> +             ret = of_property_read_u32(n, "channel", &priv->chan);
> >> +             if (ret) {
> >> +                     dev_warn(d, "no channel info\n");
> >> +                     goto init_fail;
> >> +             }
> >> +     } else {
> >> +             dev_warn(d, "no port-handle\n");
> >> +             ret = -EINVAL;
> >> +             goto init_fail;
> >> +     }
> >
> > Doing the lookup by "compatible" string doesn't really help you
> > at solve the problem of single-instance ppe at all, because that
> > function will only ever look at the first one. Use
> > syscon_regmap_lookup_by_phandle instead and pass the phandle
> > you get from the "port-handle" property.
> 
> Did some experiment, the only ppe base is got from syscon probe.
> And looking up by "compatible" did work for three controllers at the same time.

The point is that it won't work for more than one ppe instance.

> > Also, since you decided to treat the ppe as a dump regmap, I would
> > recommend moving the 'reg' and 'channel' properties into arguments
> > of the port-handle link, and retting rid of the child nodes of
> > the ppe, like:
> >
> > +       ppe: ppe@28c0000 {
> > +               compatible = "hisilicon,hip04-ppe", "syscon";
> > +               reg = <0x28c0000 0x10000>;
> > +       };
> > +
> > +       fe: ethernet@28b0000 {
> > +               compatible = "hisilicon,hip04-mac";
> > +               reg = <0x28b0000 0x10000>;
> > +               interrupts = <0 413 4>;
> > +               phy-mode = "mii";
> > +               port-handle = <&ppe 0xf1 0>;
> > +       };
> >
> 
> Would you mind giving more hints about how to parse the args.
> I am thinking of using of_parse_phandle_with_args, is that correct method?

I would just use of_property_read_u32_array() to read all three cells,
then pass the first cell entry into syscon_regmap_lookup_by_phandle.
of_parse_phandle_with_args() is what you would use if you were registering
a higher-level driver for the ppe rather than using syscon.

	Arnd

^ permalink raw reply	[flat|nested] 148+ messages in thread

* [PATCH 3/3] net: hisilicon: new hip04 ethernet driver
@ 2014-03-25  8:12           ` Arnd Bergmann
  0 siblings, 0 replies; 148+ messages in thread
From: Arnd Bergmann @ 2014-03-25  8:12 UTC (permalink / raw)
  To: linux-arm-kernel

On Tuesday 25 March 2014 12:06:31 Zhangfei Gao wrote:
> Dear Arnd
> 
> On Mon, Mar 24, 2014 at 11:18 PM, Arnd Bergmann <arnd@arndb.de> wrote:
> > On Monday 24 March 2014 22:14:56 Zhangfei Gao wrote:
> >
> >> +
> >> +static void hip04_tx_reclaim(struct net_device *ndev, bool force)
> >> +{
> >> +     struct hip04_priv *priv = netdev_priv(ndev);
> >> +     unsigned tx_head = priv->tx_head;
> >> +     unsigned tx_tail = priv->tx_tail;
> >> +     struct tx_desc *desc = &priv->tx_desc[priv->tx_tail];
> >> +
> >> +     while (tx_tail != tx_head) {
> >> +             if (desc->send_addr != 0) {
> >> +                     if (force)
> >> +                             desc->send_addr = 0;
> >> +                     else
> >> +                             break;
> >> +             }
> >> +             if (priv->tx_phys[tx_tail]) {
> >> +                     dma_unmap_single(&ndev->dev, priv->tx_phys[tx_tail],
> >> +                             priv->tx_skb[tx_tail]->len, DMA_TO_DEVICE);
> >> +                     priv->tx_phys[tx_tail] = 0;
> >> +             }
> >> +             dev_kfree_skb_irq(priv->tx_skb[tx_tail]);
> >> +             priv->tx_skb[tx_tail] = NULL;
> >> +             tx_tail = TX_NEXT(tx_tail);
> >> +             priv->tx_count--;
> >> +     }
> >> +     priv->tx_tail = tx_tail;
> >> +}
> >
> > I think you still need to find a solution to ensure that the tx reclaim is
> > called eventually through a method other than start_xmit.
> 
> In the iperf stress test, if move reclaim to poll, there is some
> error, sometimes sending zero packets.
> While keep reclaim in the xmit to reclaim transmitted packets looks
> stable in the test,
> There TX_DESC_NUM desc can be used.

What I meant is that you need a correct implementation, presumably
you added a bug when you moved the function to poll(), and also you
forgot to add a timer.

> >> +     priv->map = syscon_regmap_lookup_by_compatible("hisilicon,hip04-ppe");
> >> +     if (IS_ERR(priv->map)) {
> >> +             dev_warn(d, "no hisilicon,hip04-ppe\n");
> >> +             ret = PTR_ERR(priv->map);
> >> +             goto init_fail;
> >> +     }
> >> +
> >> +     n = of_parse_phandle(node, "port-handle", 0);
> >> +     if (n) {
> >> +             ret = of_property_read_u32(n, "reg", &priv->port);
> >> +             if (ret) {
> >> +                     dev_warn(d, "no reg info\n");
> >> +                     goto init_fail;
> >> +             }
> >> +
> >> +             ret = of_property_read_u32(n, "channel", &priv->chan);
> >> +             if (ret) {
> >> +                     dev_warn(d, "no channel info\n");
> >> +                     goto init_fail;
> >> +             }
> >> +     } else {
> >> +             dev_warn(d, "no port-handle\n");
> >> +             ret = -EINVAL;
> >> +             goto init_fail;
> >> +     }
> >
> > Doing the lookup by "compatible" string doesn't really help you
> > at solve the problem of single-instance ppe at all, because that
> > function will only ever look at the first one. Use
> > syscon_regmap_lookup_by_phandle instead and pass the phandle
> > you get from the "port-handle" property.
> 
> Did some experiment, the only ppe base is got from syscon probe.
> And looking up by "compatible" did work for three controllers at the same time.

The point is that it won't work for more than one ppe instance.

> > Also, since you decided to treat the ppe as a dump regmap, I would
> > recommend moving the 'reg' and 'channel' properties into arguments
> > of the port-handle link, and retting rid of the child nodes of
> > the ppe, like:
> >
> > +       ppe: ppe at 28c0000 {
> > +               compatible = "hisilicon,hip04-ppe", "syscon";
> > +               reg = <0x28c0000 0x10000>;
> > +       };
> > +
> > +       fe: ethernet at 28b0000 {
> > +               compatible = "hisilicon,hip04-mac";
> > +               reg = <0x28b0000 0x10000>;
> > +               interrupts = <0 413 4>;
> > +               phy-mode = "mii";
> > +               port-handle = <&ppe 0xf1 0>;
> > +       };
> >
> 
> Would you mind giving more hints about how to parse the args.
> I am thinking of using of_parse_phandle_with_args, is that correct method?

I would just use of_property_read_u32_array() to read all three cells,
then pass the first cell entry into syscon_regmap_lookup_by_phandle.
of_parse_phandle_with_args() is what you would use if you were registering
a higher-level driver for the ppe rather than using syscon.

	Arnd

^ permalink raw reply	[flat|nested] 148+ messages in thread

* Re: [PATCH 3/3] net: hisilicon: new hip04 ethernet driver
  2014-03-24 15:18       ` Arnd Bergmann
@ 2014-03-25  4:06         ` Zhangfei Gao
  -1 siblings, 0 replies; 148+ messages in thread
From: Zhangfei Gao @ 2014-03-25  4:06 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: linux-arm-kernel, Mark Rutland,
	devicetree-u79uwXL29TY76Z2rM5mHXA, Florian Fainelli,
	Russell King - ARM Linux, Sergei Shtylyov, netdev, Zhangfei Gao,
	David S. Miller

Dear Arnd

On Mon, Mar 24, 2014 at 11:18 PM, Arnd Bergmann <arnd-r2nGTMty4D4@public.gmane.org> wrote:
> On Monday 24 March 2014 22:14:56 Zhangfei Gao wrote:
>
>> +
>> +static void hip04_tx_reclaim(struct net_device *ndev, bool force)
>> +{
>> +     struct hip04_priv *priv = netdev_priv(ndev);
>> +     unsigned tx_head = priv->tx_head;
>> +     unsigned tx_tail = priv->tx_tail;
>> +     struct tx_desc *desc = &priv->tx_desc[priv->tx_tail];
>> +
>> +     while (tx_tail != tx_head) {
>> +             if (desc->send_addr != 0) {
>> +                     if (force)
>> +                             desc->send_addr = 0;
>> +                     else
>> +                             break;
>> +             }
>> +             if (priv->tx_phys[tx_tail]) {
>> +                     dma_unmap_single(&ndev->dev, priv->tx_phys[tx_tail],
>> +                             priv->tx_skb[tx_tail]->len, DMA_TO_DEVICE);
>> +                     priv->tx_phys[tx_tail] = 0;
>> +             }
>> +             dev_kfree_skb_irq(priv->tx_skb[tx_tail]);
>> +             priv->tx_skb[tx_tail] = NULL;
>> +             tx_tail = TX_NEXT(tx_tail);
>> +             priv->tx_count--;
>> +     }
>> +     priv->tx_tail = tx_tail;
>> +}
>
> I think you still need to find a solution to ensure that the tx reclaim is
> called eventually through a method other than start_xmit.

In the iperf stress test, if move reclaim to poll, there is some
error, sometimes sending zero packets.
While keep reclaim in the xmit to reclaim transmitted packets looks
stable in the test,
There TX_DESC_NUM desc can be used.

>
>> +
>> +     priv->map = syscon_regmap_lookup_by_compatible("hisilicon,hip04-ppe");
>> +     if (IS_ERR(priv->map)) {
>> +             dev_warn(d, "no hisilicon,hip04-ppe\n");
>> +             ret = PTR_ERR(priv->map);
>> +             goto init_fail;
>> +     }
>> +
>> +     n = of_parse_phandle(node, "port-handle", 0);
>> +     if (n) {
>> +             ret = of_property_read_u32(n, "reg", &priv->port);
>> +             if (ret) {
>> +                     dev_warn(d, "no reg info\n");
>> +                     goto init_fail;
>> +             }
>> +
>> +             ret = of_property_read_u32(n, "channel", &priv->chan);
>> +             if (ret) {
>> +                     dev_warn(d, "no channel info\n");
>> +                     goto init_fail;
>> +             }
>> +     } else {
>> +             dev_warn(d, "no port-handle\n");
>> +             ret = -EINVAL;
>> +             goto init_fail;
>> +     }
>
> Doing the lookup by "compatible" string doesn't really help you
> at solve the problem of single-instance ppe at all, because that
> function will only ever look at the first one. Use
> syscon_regmap_lookup_by_phandle instead and pass the phandle
> you get from the "port-handle" property.

Did some experiment, the only ppe base is got from syscon probe.
And looking up by "compatible" did work for three controllers at the same time.

>
> Also, since you decided to treat the ppe as a dump regmap, I would
> recommend moving the 'reg' and 'channel' properties into arguments
> of the port-handle link, and retting rid of the child nodes of
> the ppe, like:
>
> +       ppe: ppe@28c0000 {
> +               compatible = "hisilicon,hip04-ppe", "syscon";
> +               reg = <0x28c0000 0x10000>;
> +       };
> +
> +       fe: ethernet@28b0000 {
> +               compatible = "hisilicon,hip04-mac";
> +               reg = <0x28b0000 0x10000>;
> +               interrupts = <0 413 4>;
> +               phy-mode = "mii";
> +               port-handle = <&ppe 0xf1 0>;
> +       };
>

Would you mind giving more hints about how to parse the args.
I am thinking of using of_parse_phandle_with_args, is that correct method?

Thanks
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 148+ messages in thread

* [PATCH 3/3] net: hisilicon: new hip04 ethernet driver
@ 2014-03-25  4:06         ` Zhangfei Gao
  0 siblings, 0 replies; 148+ messages in thread
From: Zhangfei Gao @ 2014-03-25  4:06 UTC (permalink / raw)
  To: linux-arm-kernel

Dear Arnd

On Mon, Mar 24, 2014 at 11:18 PM, Arnd Bergmann <arnd@arndb.de> wrote:
> On Monday 24 March 2014 22:14:56 Zhangfei Gao wrote:
>
>> +
>> +static void hip04_tx_reclaim(struct net_device *ndev, bool force)
>> +{
>> +     struct hip04_priv *priv = netdev_priv(ndev);
>> +     unsigned tx_head = priv->tx_head;
>> +     unsigned tx_tail = priv->tx_tail;
>> +     struct tx_desc *desc = &priv->tx_desc[priv->tx_tail];
>> +
>> +     while (tx_tail != tx_head) {
>> +             if (desc->send_addr != 0) {
>> +                     if (force)
>> +                             desc->send_addr = 0;
>> +                     else
>> +                             break;
>> +             }
>> +             if (priv->tx_phys[tx_tail]) {
>> +                     dma_unmap_single(&ndev->dev, priv->tx_phys[tx_tail],
>> +                             priv->tx_skb[tx_tail]->len, DMA_TO_DEVICE);
>> +                     priv->tx_phys[tx_tail] = 0;
>> +             }
>> +             dev_kfree_skb_irq(priv->tx_skb[tx_tail]);
>> +             priv->tx_skb[tx_tail] = NULL;
>> +             tx_tail = TX_NEXT(tx_tail);
>> +             priv->tx_count--;
>> +     }
>> +     priv->tx_tail = tx_tail;
>> +}
>
> I think you still need to find a solution to ensure that the tx reclaim is
> called eventually through a method other than start_xmit.

In the iperf stress test, if move reclaim to poll, there is some
error, sometimes sending zero packets.
While keep reclaim in the xmit to reclaim transmitted packets looks
stable in the test,
There TX_DESC_NUM desc can be used.

>
>> +
>> +     priv->map = syscon_regmap_lookup_by_compatible("hisilicon,hip04-ppe");
>> +     if (IS_ERR(priv->map)) {
>> +             dev_warn(d, "no hisilicon,hip04-ppe\n");
>> +             ret = PTR_ERR(priv->map);
>> +             goto init_fail;
>> +     }
>> +
>> +     n = of_parse_phandle(node, "port-handle", 0);
>> +     if (n) {
>> +             ret = of_property_read_u32(n, "reg", &priv->port);
>> +             if (ret) {
>> +                     dev_warn(d, "no reg info\n");
>> +                     goto init_fail;
>> +             }
>> +
>> +             ret = of_property_read_u32(n, "channel", &priv->chan);
>> +             if (ret) {
>> +                     dev_warn(d, "no channel info\n");
>> +                     goto init_fail;
>> +             }
>> +     } else {
>> +             dev_warn(d, "no port-handle\n");
>> +             ret = -EINVAL;
>> +             goto init_fail;
>> +     }
>
> Doing the lookup by "compatible" string doesn't really help you
> at solve the problem of single-instance ppe at all, because that
> function will only ever look at the first one. Use
> syscon_regmap_lookup_by_phandle instead and pass the phandle
> you get from the "port-handle" property.

Did some experiment, the only ppe base is got from syscon probe.
And looking up by "compatible" did work for three controllers at the same time.

>
> Also, since you decided to treat the ppe as a dump regmap, I would
> recommend moving the 'reg' and 'channel' properties into arguments
> of the port-handle link, and retting rid of the child nodes of
> the ppe, like:
>
> +       ppe: ppe at 28c0000 {
> +               compatible = "hisilicon,hip04-ppe", "syscon";
> +               reg = <0x28c0000 0x10000>;
> +       };
> +
> +       fe: ethernet at 28b0000 {
> +               compatible = "hisilicon,hip04-mac";
> +               reg = <0x28b0000 0x10000>;
> +               interrupts = <0 413 4>;
> +               phy-mode = "mii";
> +               port-handle = <&ppe 0xf1 0>;
> +       };
>

Would you mind giving more hints about how to parse the args.
I am thinking of using of_parse_phandle_with_args, is that correct method?

Thanks

^ permalink raw reply	[flat|nested] 148+ messages in thread

* Re: [PATCH 3/3] net: hisilicon: new hip04 ethernet driver
  2014-03-24 17:23       ` Arnd Bergmann
@ 2014-03-24 17:35         ` Florian Fainelli
  -1 siblings, 0 replies; 148+ messages in thread
From: Florian Fainelli @ 2014-03-24 17:35 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Mark Rutland, devicetree, Russell King, Sergei Shtylyov, netdev,
	Zhangfei Gao, David Miller, linux-arm-kernel

2014-03-24 10:23 GMT-07:00 Arnd Bergmann <arnd@arndb.de>:
> On Monday 24 March 2014 09:32:17 Florian Fainelli wrote:
>> > +       priv->tx_skb[tx_head] = skb;
>> > +       priv->tx_phys[tx_head] = phys;
>> > +       desc->send_addr = cpu_to_be32(phys);
>> > +       desc->send_size = cpu_to_be16(skb->len);
>> > +       desc->cfg = cpu_to_be32(DESC_DEF_CFG);
>> > +       phys = priv->tx_desc_dma + tx_head * sizeof(struct tx_desc);
>> > +       desc->wb_addr = cpu_to_be32(phys);
>>
>> Don't we need a barrier here to ensure that all stores are completed
>> before we hand this descriptor address to hip40_set_xmit_desc() which
>> should make DMA start processing it?
>
> I would think the writel() in set_xmit_desc() implies the barrier.

Right, which means that this should be properly documented to make
sure that this simplification is well understood and produces the
expected result.
-- 
Florian

^ permalink raw reply	[flat|nested] 148+ messages in thread

* [PATCH 3/3] net: hisilicon: new hip04 ethernet driver
@ 2014-03-24 17:35         ` Florian Fainelli
  0 siblings, 0 replies; 148+ messages in thread
From: Florian Fainelli @ 2014-03-24 17:35 UTC (permalink / raw)
  To: linux-arm-kernel

2014-03-24 10:23 GMT-07:00 Arnd Bergmann <arnd@arndb.de>:
> On Monday 24 March 2014 09:32:17 Florian Fainelli wrote:
>> > +       priv->tx_skb[tx_head] = skb;
>> > +       priv->tx_phys[tx_head] = phys;
>> > +       desc->send_addr = cpu_to_be32(phys);
>> > +       desc->send_size = cpu_to_be16(skb->len);
>> > +       desc->cfg = cpu_to_be32(DESC_DEF_CFG);
>> > +       phys = priv->tx_desc_dma + tx_head * sizeof(struct tx_desc);
>> > +       desc->wb_addr = cpu_to_be32(phys);
>>
>> Don't we need a barrier here to ensure that all stores are completed
>> before we hand this descriptor address to hip40_set_xmit_desc() which
>> should make DMA start processing it?
>
> I would think the writel() in set_xmit_desc() implies the barrier.

Right, which means that this should be properly documented to make
sure that this simplification is well understood and produces the
expected result.
-- 
Florian

^ permalink raw reply	[flat|nested] 148+ messages in thread

* Re: [PATCH 3/3] net: hisilicon: new hip04 ethernet driver
  2014-03-24 16:32     ` Florian Fainelli
@ 2014-03-24 17:23       ` Arnd Bergmann
  -1 siblings, 0 replies; 148+ messages in thread
From: Arnd Bergmann @ 2014-03-24 17:23 UTC (permalink / raw)
  To: Florian Fainelli
  Cc: Zhangfei Gao, David Miller, Russell King, Sergei Shtylyov,
	Mark Rutland, linux-arm-kernel, netdev, devicetree

On Monday 24 March 2014 09:32:17 Florian Fainelli wrote:
> > +       priv->tx_skb[tx_head] = skb;
> > +       priv->tx_phys[tx_head] = phys;
> > +       desc->send_addr = cpu_to_be32(phys);
> > +       desc->send_size = cpu_to_be16(skb->len);
> > +       desc->cfg = cpu_to_be32(DESC_DEF_CFG);
> > +       phys = priv->tx_desc_dma + tx_head * sizeof(struct tx_desc);
> > +       desc->wb_addr = cpu_to_be32(phys);
> 
> Don't we need a barrier here to ensure that all stores are completed
> before we hand this descriptor address to hip40_set_xmit_desc() which
> should make DMA start processing it?

I would think the writel() in set_xmit_desc() implies the barrier.

	Arnd

^ permalink raw reply	[flat|nested] 148+ messages in thread

* [PATCH 3/3] net: hisilicon: new hip04 ethernet driver
@ 2014-03-24 17:23       ` Arnd Bergmann
  0 siblings, 0 replies; 148+ messages in thread
From: Arnd Bergmann @ 2014-03-24 17:23 UTC (permalink / raw)
  To: linux-arm-kernel

On Monday 24 March 2014 09:32:17 Florian Fainelli wrote:
> > +       priv->tx_skb[tx_head] = skb;
> > +       priv->tx_phys[tx_head] = phys;
> > +       desc->send_addr = cpu_to_be32(phys);
> > +       desc->send_size = cpu_to_be16(skb->len);
> > +       desc->cfg = cpu_to_be32(DESC_DEF_CFG);
> > +       phys = priv->tx_desc_dma + tx_head * sizeof(struct tx_desc);
> > +       desc->wb_addr = cpu_to_be32(phys);
> 
> Don't we need a barrier here to ensure that all stores are completed
> before we hand this descriptor address to hip40_set_xmit_desc() which
> should make DMA start processing it?

I would think the writel() in set_xmit_desc() implies the barrier.

	Arnd

^ permalink raw reply	[flat|nested] 148+ messages in thread

* Re: [PATCH 3/3] net: hisilicon: new hip04 ethernet driver
  2014-03-24 14:14   ` Zhangfei Gao
@ 2014-03-24 16:32     ` Florian Fainelli
  -1 siblings, 0 replies; 148+ messages in thread
From: Florian Fainelli @ 2014-03-24 16:32 UTC (permalink / raw)
  To: Zhangfei Gao
  Cc: David Miller, Russell King, Arnd Bergmann, Sergei Shtylyov,
	Mark Rutland, linux-arm-kernel, netdev, devicetree

2014-03-24 7:14 GMT-07:00 Zhangfei Gao <zhangfei.gao@linaro.org>:
> Support Hisilicon hip04 ethernet driver, including 100M / 1000M controller
>
> Signed-off-by: Zhangfei Gao <zhangfei.gao@linaro.org>
> ---
>  drivers/net/ethernet/hisilicon/Makefile    |    2 +-
>  drivers/net/ethernet/hisilicon/hip04_eth.c |  728 ++++++++++++++++++++++++++++
>  2 files changed, 729 insertions(+), 1 deletion(-)
>  create mode 100644 drivers/net/ethernet/hisilicon/hip04_eth.c

[snip]

> +static void hip04_config_port(struct hip04_priv *priv, u32 speed, u32 duplex)
> +{
> +       u32 val;
> +
> +       priv->speed = speed;
> +       priv->duplex = duplex;
> +
> +       switch (priv->phy_mode) {
> +       case PHY_INTERFACE_MODE_SGMII:
> +               if (speed == SPEED_1000)
> +                       val = 8;
> +               else
> +                       val = 7;
> +               break;
> +       case PHY_INTERFACE_MODE_MII:
> +               val = 1;        /* SPEED_100 */
> +               break;
> +       default:
> +               val = 0;
> +               break;

Is 0 valid for e.g: 10Mbits/sec, regardless of the phy_mode?

[snip]

> +
> +static void hip04_mac_enable(struct net_device *ndev, bool enable)
> +{
> +       struct hip04_priv *priv = netdev_priv(ndev);
> +       u32 val;
> +
> +       if (enable) {
> +               /* enable tx & rx */
> +               val = readl_relaxed(priv->base + GE_PORT_EN);
> +               val |= BIT(1);          /* rx*/
> +               val |= BIT(2);          /* tx*/
> +               writel_relaxed(val, priv->base + GE_PORT_EN);
> +
> +               /* enable interrupt */
> +               priv->reg_inten = DEF_INT_MASK;
> +               writel_relaxed(priv->reg_inten, priv->base + PPE_INTEN);
> +
> +               /* clear rx int */
> +               val = RCV_INT;
> +               writel_relaxed(val, priv->base + PPE_RINT);

Should not you first clear the interrupt and then DEF_INT_MASK? Why is
there a RCV_INT applied to PPE_RINT register in the enable path, but
there is no such thing in the "disable" branch of your function?

> +
> +               /* config recv int*/
> +               val = BIT(6);           /* int threshold 1 package */
> +               val |= 0x4;             /* recv timeout */
> +               writel_relaxed(val, priv->base + PPE_CFG_RX_PKT_INT);
> +       } else {
> +               /* disable int */
> +               priv->reg_inten &= ~(RCV_INT | RCV_NOBUF);
> +               writel_relaxed(priv->reg_inten, priv->base + PPE_INTEN);
> +
> +               /* disable tx & rx */
> +               val = readl_relaxed(priv->base + GE_PORT_EN);
> +               val &= ~(BIT(1));       /* rx*/
> +               val &= ~(BIT(2));       /* tx*/
> +               writel_relaxed(val, priv->base + GE_PORT_EN);
> +       }

There is little to no sharing between the two branches, I would have
created separate helper functions for the enable/disable logic.

> +}
> +
> +static void hip04_set_xmit_desc(struct hip04_priv *priv, dma_addr_t phys)
> +{
> +       writel(phys, priv->base + PPE_CFG_TX_PKT_BD_ADDR);

This is not 64-bits/LPAE safe, do you have a High address part and a
Low address part for your address in the buffer descriptor address, if
so, better use it now.

[snip]

> +
> +static int hip04_rx_poll(struct napi_struct *napi, int budget)
> +{
> +       struct hip04_priv *priv = container_of(napi, struct hip04_priv, napi);
> +       struct net_device *ndev = priv->ndev;
> +       struct net_device_stats *stats = &ndev->stats;
> +       unsigned int cnt = hip04_recv_cnt(priv);
> +       struct sk_buff *skb;
> +       struct rx_desc *desc;
> +       unsigned char *buf;
> +       dma_addr_t phys;
> +       int rx = 0;
> +       u16 len;
> +       u32 err;
> +
> +       while (cnt) {
> +               buf = priv->rx_buf[priv->rx_head];
> +               skb = build_skb(buf, priv->rx_buf_size);
> +               if (unlikely(!skb))
> +                       net_dbg_ratelimited("build_skb failed\n");
> +
> +               dma_unmap_single(&ndev->dev, priv->rx_phys[priv->rx_head],
> +                               RX_BUF_SIZE, DMA_FROM_DEVICE);
> +               priv->rx_phys[priv->rx_head] = 0;
> +
> +               desc = (struct rx_desc *)skb->data;
> +               len = be16_to_cpu(desc->pkt_len);
> +               err = be32_to_cpu(desc->pkt_err);
> +
> +               if (len > RX_BUF_SIZE)
> +                       len = RX_BUF_SIZE;
> +               if (0 == len)
> +                       break;

Should not this be a continue? This is an error packet, so you should
keep on processing the others, or does this have a special meaning?

> +
> +               if (err & RX_PKT_ERR) {
> +                       dev_kfree_skb_any(skb);
> +                       stats->rx_dropped++;
> +                       stats->rx_errors++;
> +                       continue;
> +               }
> +
> +               stats->rx_packets++;
> +               stats->rx_bytes += len;
> +
> +               skb_reserve(skb, NET_SKB_PAD + NET_IP_ALIGN);
> +               skb_put(skb, len);
> +               skb->protocol = eth_type_trans(skb, ndev);
> +               napi_gro_receive(&priv->napi, skb);
> +
> +               buf = netdev_alloc_frag(priv->rx_buf_size);
> +               if (!buf)
> +                       return -ENOMEM;
> +               phys = dma_map_single(&ndev->dev, buf,
> +                               RX_BUF_SIZE, DMA_FROM_DEVICE);

Missing dma_mapping_error() check here.

> +               priv->rx_buf[priv->rx_head] = buf;
> +               priv->rx_phys[priv->rx_head] = phys;
> +               hip04_set_recv_desc(priv, phys);
> +
> +               priv->rx_head = RX_NEXT(priv->rx_head);
> +               if (rx++ >= budget)
> +                       break;
> +
> +               if (--cnt == 0)
> +                       cnt = hip04_recv_cnt(priv);

> +       }
> +
> +       if (rx < budget) {
> +               napi_complete(napi);
> +
> +               /* enable rx interrupt */
> +               priv->reg_inten |= RCV_INT | RCV_NOBUF;
> +               writel_relaxed(priv->reg_inten, priv->base + PPE_INTEN);
> +       }
> +
> +       return rx;
> +}
> +
> +static irqreturn_t hip04_mac_interrupt(int irq, void *dev_id)
> +{
> +       struct net_device *ndev = (struct net_device *) dev_id;
> +       struct hip04_priv *priv = netdev_priv(ndev);
> +       u32 ists = readl_relaxed(priv->base + PPE_INTSTS);
> +       u32 val = DEF_INT_MASK;
> +
> +       writel_relaxed(val, priv->base + PPE_RINT);
> +
> +       if (ists & (RCV_INT | RCV_NOBUF)) {
> +               if (napi_schedule_prep(&priv->napi)) {
> +                       /* disable rx interrupt */
> +                       priv->reg_inten &= ~(RCV_INT | RCV_NOBUF);
> +                       writel_relaxed(priv->reg_inten, priv->base + PPE_INTEN);
> +                       __napi_schedule(&priv->napi);
> +               }
> +       }

You should also process TX completion interrupts here

> +
> +       return IRQ_HANDLED;
> +}
> +
> +static void hip04_tx_reclaim(struct net_device *ndev, bool force)
> +{
> +       struct hip04_priv *priv = netdev_priv(ndev);
> +       unsigned tx_head = priv->tx_head;
> +       unsigned tx_tail = priv->tx_tail;
> +       struct tx_desc *desc = &priv->tx_desc[priv->tx_tail];
> +
> +       while (tx_tail != tx_head) {
> +               if (desc->send_addr != 0) {
> +                       if (force)
> +                               desc->send_addr = 0;
> +                       else
> +                               break;
> +               }
> +               if (priv->tx_phys[tx_tail]) {
> +                       dma_unmap_single(&ndev->dev, priv->tx_phys[tx_tail],
> +                               priv->tx_skb[tx_tail]->len, DMA_TO_DEVICE);
> +                       priv->tx_phys[tx_tail] = 0;
> +               }
> +               dev_kfree_skb_irq(priv->tx_skb[tx_tail]);

dev_kfree_skb_irq() bypasses all sort of SKB tracking, you might want
to use kfree_skb() here instead.

> +               priv->tx_skb[tx_tail] = NULL;
> +               tx_tail = TX_NEXT(tx_tail);
> +               priv->tx_count--;
> +       }
> +       priv->tx_tail = tx_tail;
> +}
> +
> +static int hip04_mac_start_xmit(struct sk_buff *skb, struct net_device *ndev)
> +{
> +       struct hip04_priv *priv = netdev_priv(ndev);
> +       struct net_device_stats *stats = &ndev->stats;
> +       unsigned int tx_head = priv->tx_head;
> +       struct tx_desc *desc = &priv->tx_desc[tx_head];
> +       dma_addr_t phys;
> +
> +       hip04_tx_reclaim(ndev, false);
> +
> +       if (priv->tx_count++ >= TX_DESC_NUM) {
> +               net_dbg_ratelimited("no TX space for packet\n");
> +               netif_stop_queue(ndev);
> +               return NETDEV_TX_BUSY;
> +       }
> +
> +       phys = dma_map_single(&ndev->dev, skb->data, skb->len, DMA_TO_DEVICE);

Missing dma_mapping_error() check here

> +       priv->tx_skb[tx_head] = skb;
> +       priv->tx_phys[tx_head] = phys;
> +       desc->send_addr = cpu_to_be32(phys);
> +       desc->send_size = cpu_to_be16(skb->len);
> +       desc->cfg = cpu_to_be32(DESC_DEF_CFG);
> +       phys = priv->tx_desc_dma + tx_head * sizeof(struct tx_desc);
> +       desc->wb_addr = cpu_to_be32(phys);

Don't we need a barrier here to ensure that all stores are completed
before we hand this descriptor address to hip40_set_xmit_desc() which
should make DMA start processing it?

> +       skb_tx_timestamp(skb);
> +       hip04_set_xmit_desc(priv, phys);
> +       priv->tx_head = TX_NEXT(tx_head);
> +
> +       stats->tx_bytes += skb->len;
> +       stats->tx_packets++;

You cannot update the transmit stats here, what start_xmit() does it
just queue packets for the DMA engine to process them, but that does
not mean DMA has completed those. You should update statistics in the
tx_reclaim() function.
-- 
Florian

^ permalink raw reply	[flat|nested] 148+ messages in thread

* [PATCH 3/3] net: hisilicon: new hip04 ethernet driver
@ 2014-03-24 16:32     ` Florian Fainelli
  0 siblings, 0 replies; 148+ messages in thread
From: Florian Fainelli @ 2014-03-24 16:32 UTC (permalink / raw)
  To: linux-arm-kernel

2014-03-24 7:14 GMT-07:00 Zhangfei Gao <zhangfei.gao@linaro.org>:
> Support Hisilicon hip04 ethernet driver, including 100M / 1000M controller
>
> Signed-off-by: Zhangfei Gao <zhangfei.gao@linaro.org>
> ---
>  drivers/net/ethernet/hisilicon/Makefile    |    2 +-
>  drivers/net/ethernet/hisilicon/hip04_eth.c |  728 ++++++++++++++++++++++++++++
>  2 files changed, 729 insertions(+), 1 deletion(-)
>  create mode 100644 drivers/net/ethernet/hisilicon/hip04_eth.c

[snip]

> +static void hip04_config_port(struct hip04_priv *priv, u32 speed, u32 duplex)
> +{
> +       u32 val;
> +
> +       priv->speed = speed;
> +       priv->duplex = duplex;
> +
> +       switch (priv->phy_mode) {
> +       case PHY_INTERFACE_MODE_SGMII:
> +               if (speed == SPEED_1000)
> +                       val = 8;
> +               else
> +                       val = 7;
> +               break;
> +       case PHY_INTERFACE_MODE_MII:
> +               val = 1;        /* SPEED_100 */
> +               break;
> +       default:
> +               val = 0;
> +               break;

Is 0 valid for e.g: 10Mbits/sec, regardless of the phy_mode?

[snip]

> +
> +static void hip04_mac_enable(struct net_device *ndev, bool enable)
> +{
> +       struct hip04_priv *priv = netdev_priv(ndev);
> +       u32 val;
> +
> +       if (enable) {
> +               /* enable tx & rx */
> +               val = readl_relaxed(priv->base + GE_PORT_EN);
> +               val |= BIT(1);          /* rx*/
> +               val |= BIT(2);          /* tx*/
> +               writel_relaxed(val, priv->base + GE_PORT_EN);
> +
> +               /* enable interrupt */
> +               priv->reg_inten = DEF_INT_MASK;
> +               writel_relaxed(priv->reg_inten, priv->base + PPE_INTEN);
> +
> +               /* clear rx int */
> +               val = RCV_INT;
> +               writel_relaxed(val, priv->base + PPE_RINT);

Should not you first clear the interrupt and then DEF_INT_MASK? Why is
there a RCV_INT applied to PPE_RINT register in the enable path, but
there is no such thing in the "disable" branch of your function?

> +
> +               /* config recv int*/
> +               val = BIT(6);           /* int threshold 1 package */
> +               val |= 0x4;             /* recv timeout */
> +               writel_relaxed(val, priv->base + PPE_CFG_RX_PKT_INT);
> +       } else {
> +               /* disable int */
> +               priv->reg_inten &= ~(RCV_INT | RCV_NOBUF);
> +               writel_relaxed(priv->reg_inten, priv->base + PPE_INTEN);
> +
> +               /* disable tx & rx */
> +               val = readl_relaxed(priv->base + GE_PORT_EN);
> +               val &= ~(BIT(1));       /* rx*/
> +               val &= ~(BIT(2));       /* tx*/
> +               writel_relaxed(val, priv->base + GE_PORT_EN);
> +       }

There is little to no sharing between the two branches, I would have
created separate helper functions for the enable/disable logic.

> +}
> +
> +static void hip04_set_xmit_desc(struct hip04_priv *priv, dma_addr_t phys)
> +{
> +       writel(phys, priv->base + PPE_CFG_TX_PKT_BD_ADDR);

This is not 64-bits/LPAE safe, do you have a High address part and a
Low address part for your address in the buffer descriptor address, if
so, better use it now.

[snip]

> +
> +static int hip04_rx_poll(struct napi_struct *napi, int budget)
> +{
> +       struct hip04_priv *priv = container_of(napi, struct hip04_priv, napi);
> +       struct net_device *ndev = priv->ndev;
> +       struct net_device_stats *stats = &ndev->stats;
> +       unsigned int cnt = hip04_recv_cnt(priv);
> +       struct sk_buff *skb;
> +       struct rx_desc *desc;
> +       unsigned char *buf;
> +       dma_addr_t phys;
> +       int rx = 0;
> +       u16 len;
> +       u32 err;
> +
> +       while (cnt) {
> +               buf = priv->rx_buf[priv->rx_head];
> +               skb = build_skb(buf, priv->rx_buf_size);
> +               if (unlikely(!skb))
> +                       net_dbg_ratelimited("build_skb failed\n");
> +
> +               dma_unmap_single(&ndev->dev, priv->rx_phys[priv->rx_head],
> +                               RX_BUF_SIZE, DMA_FROM_DEVICE);
> +               priv->rx_phys[priv->rx_head] = 0;
> +
> +               desc = (struct rx_desc *)skb->data;
> +               len = be16_to_cpu(desc->pkt_len);
> +               err = be32_to_cpu(desc->pkt_err);
> +
> +               if (len > RX_BUF_SIZE)
> +                       len = RX_BUF_SIZE;
> +               if (0 == len)
> +                       break;

Should not this be a continue? This is an error packet, so you should
keep on processing the others, or does this have a special meaning?

> +
> +               if (err & RX_PKT_ERR) {
> +                       dev_kfree_skb_any(skb);
> +                       stats->rx_dropped++;
> +                       stats->rx_errors++;
> +                       continue;
> +               }
> +
> +               stats->rx_packets++;
> +               stats->rx_bytes += len;
> +
> +               skb_reserve(skb, NET_SKB_PAD + NET_IP_ALIGN);
> +               skb_put(skb, len);
> +               skb->protocol = eth_type_trans(skb, ndev);
> +               napi_gro_receive(&priv->napi, skb);
> +
> +               buf = netdev_alloc_frag(priv->rx_buf_size);
> +               if (!buf)
> +                       return -ENOMEM;
> +               phys = dma_map_single(&ndev->dev, buf,
> +                               RX_BUF_SIZE, DMA_FROM_DEVICE);

Missing dma_mapping_error() check here.

> +               priv->rx_buf[priv->rx_head] = buf;
> +               priv->rx_phys[priv->rx_head] = phys;
> +               hip04_set_recv_desc(priv, phys);
> +
> +               priv->rx_head = RX_NEXT(priv->rx_head);
> +               if (rx++ >= budget)
> +                       break;
> +
> +               if (--cnt == 0)
> +                       cnt = hip04_recv_cnt(priv);

> +       }
> +
> +       if (rx < budget) {
> +               napi_complete(napi);
> +
> +               /* enable rx interrupt */
> +               priv->reg_inten |= RCV_INT | RCV_NOBUF;
> +               writel_relaxed(priv->reg_inten, priv->base + PPE_INTEN);
> +       }
> +
> +       return rx;
> +}
> +
> +static irqreturn_t hip04_mac_interrupt(int irq, void *dev_id)
> +{
> +       struct net_device *ndev = (struct net_device *) dev_id;
> +       struct hip04_priv *priv = netdev_priv(ndev);
> +       u32 ists = readl_relaxed(priv->base + PPE_INTSTS);
> +       u32 val = DEF_INT_MASK;
> +
> +       writel_relaxed(val, priv->base + PPE_RINT);
> +
> +       if (ists & (RCV_INT | RCV_NOBUF)) {
> +               if (napi_schedule_prep(&priv->napi)) {
> +                       /* disable rx interrupt */
> +                       priv->reg_inten &= ~(RCV_INT | RCV_NOBUF);
> +                       writel_relaxed(priv->reg_inten, priv->base + PPE_INTEN);
> +                       __napi_schedule(&priv->napi);
> +               }
> +       }

You should also process TX completion interrupts here

> +
> +       return IRQ_HANDLED;
> +}
> +
> +static void hip04_tx_reclaim(struct net_device *ndev, bool force)
> +{
> +       struct hip04_priv *priv = netdev_priv(ndev);
> +       unsigned tx_head = priv->tx_head;
> +       unsigned tx_tail = priv->tx_tail;
> +       struct tx_desc *desc = &priv->tx_desc[priv->tx_tail];
> +
> +       while (tx_tail != tx_head) {
> +               if (desc->send_addr != 0) {
> +                       if (force)
> +                               desc->send_addr = 0;
> +                       else
> +                               break;
> +               }
> +               if (priv->tx_phys[tx_tail]) {
> +                       dma_unmap_single(&ndev->dev, priv->tx_phys[tx_tail],
> +                               priv->tx_skb[tx_tail]->len, DMA_TO_DEVICE);
> +                       priv->tx_phys[tx_tail] = 0;
> +               }
> +               dev_kfree_skb_irq(priv->tx_skb[tx_tail]);

dev_kfree_skb_irq() bypasses all sort of SKB tracking, you might want
to use kfree_skb() here instead.

> +               priv->tx_skb[tx_tail] = NULL;
> +               tx_tail = TX_NEXT(tx_tail);
> +               priv->tx_count--;
> +       }
> +       priv->tx_tail = tx_tail;
> +}
> +
> +static int hip04_mac_start_xmit(struct sk_buff *skb, struct net_device *ndev)
> +{
> +       struct hip04_priv *priv = netdev_priv(ndev);
> +       struct net_device_stats *stats = &ndev->stats;
> +       unsigned int tx_head = priv->tx_head;
> +       struct tx_desc *desc = &priv->tx_desc[tx_head];
> +       dma_addr_t phys;
> +
> +       hip04_tx_reclaim(ndev, false);
> +
> +       if (priv->tx_count++ >= TX_DESC_NUM) {
> +               net_dbg_ratelimited("no TX space for packet\n");
> +               netif_stop_queue(ndev);
> +               return NETDEV_TX_BUSY;
> +       }
> +
> +       phys = dma_map_single(&ndev->dev, skb->data, skb->len, DMA_TO_DEVICE);

Missing dma_mapping_error() check here

> +       priv->tx_skb[tx_head] = skb;
> +       priv->tx_phys[tx_head] = phys;
> +       desc->send_addr = cpu_to_be32(phys);
> +       desc->send_size = cpu_to_be16(skb->len);
> +       desc->cfg = cpu_to_be32(DESC_DEF_CFG);
> +       phys = priv->tx_desc_dma + tx_head * sizeof(struct tx_desc);
> +       desc->wb_addr = cpu_to_be32(phys);

Don't we need a barrier here to ensure that all stores are completed
before we hand this descriptor address to hip40_set_xmit_desc() which
should make DMA start processing it?

> +       skb_tx_timestamp(skb);
> +       hip04_set_xmit_desc(priv, phys);
> +       priv->tx_head = TX_NEXT(tx_head);
> +
> +       stats->tx_bytes += skb->len;
> +       stats->tx_packets++;

You cannot update the transmit stats here, what start_xmit() does it
just queue packets for the DMA engine to process them, but that does
not mean DMA has completed those. You should update statistics in the
tx_reclaim() function.
-- 
Florian

^ permalink raw reply	[flat|nested] 148+ messages in thread

* Re: [PATCH 3/3] net: hisilicon: new hip04 ethernet driver
  2014-03-24 14:14   ` Zhangfei Gao
@ 2014-03-24 15:18       ` Arnd Bergmann
  -1 siblings, 0 replies; 148+ messages in thread
From: Arnd Bergmann @ 2014-03-24 15:18 UTC (permalink / raw)
  To: linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r
  Cc: Zhangfei Gao, davem-fT/PcQaiUtIeIZ0/mPfg9Q,
	linux-lFZ/pmaqli7XmaaqVzeoHQ, f.fainelli-Re5JQEeQqe8AvxtiuMwx3w,
	sergei.shtylyov-M4DtvfQ/ZS1MRgGoP+s0PdBPR1lH4CV8,
	mark.rutland-5wv7dgnIgG8, netdev-u79uwXL29TY76Z2rM5mHXA,
	devicetree-u79uwXL29TY76Z2rM5mHXA

On Monday 24 March 2014 22:14:56 Zhangfei Gao wrote:

> +
> +static void hip04_tx_reclaim(struct net_device *ndev, bool force)
> +{
> +	struct hip04_priv *priv = netdev_priv(ndev);
> +	unsigned tx_head = priv->tx_head;
> +	unsigned tx_tail = priv->tx_tail;
> +	struct tx_desc *desc = &priv->tx_desc[priv->tx_tail];
> +
> +	while (tx_tail != tx_head) {
> +		if (desc->send_addr != 0) {
> +			if (force)
> +				desc->send_addr = 0;
> +			else
> +				break;
> +		}
> +		if (priv->tx_phys[tx_tail]) {
> +			dma_unmap_single(&ndev->dev, priv->tx_phys[tx_tail],
> +				priv->tx_skb[tx_tail]->len, DMA_TO_DEVICE);
> +			priv->tx_phys[tx_tail] = 0;
> +		}
> +		dev_kfree_skb_irq(priv->tx_skb[tx_tail]);
> +		priv->tx_skb[tx_tail] = NULL;
> +		tx_tail = TX_NEXT(tx_tail);
> +		priv->tx_count--;
> +	}
> +	priv->tx_tail = tx_tail;
> +}

I think you still need to find a solution to ensure that the tx reclaim is
called eventually through a method other than start_xmit.

> +
> +	priv->map = syscon_regmap_lookup_by_compatible("hisilicon,hip04-ppe");
> +	if (IS_ERR(priv->map)) {
> +		dev_warn(d, "no hisilicon,hip04-ppe\n");
> +		ret = PTR_ERR(priv->map);
> +		goto init_fail;
> +	}
> +
> +	n = of_parse_phandle(node, "port-handle", 0);
> +	if (n) {
> +		ret = of_property_read_u32(n, "reg", &priv->port);
> +		if (ret) {
> +			dev_warn(d, "no reg info\n");
> +			goto init_fail;
> +		}
> +
> +		ret = of_property_read_u32(n, "channel", &priv->chan);
> +		if (ret) {
> +			dev_warn(d, "no channel info\n");
> +			goto init_fail;
> +		}
> +	} else {
> +		dev_warn(d, "no port-handle\n");
> +		ret = -EINVAL;
> +		goto init_fail;
> +	}

Doing the lookup by "compatible" string doesn't really help you
at solve the problem of single-instance ppe at all, because that
function will only ever look at the first one. Use
syscon_regmap_lookup_by_phandle instead and pass the phandle
you get from the "port-handle" property.

Also, since you decided to treat the ppe as a dump regmap, I would
recommend moving the 'reg' and 'channel' properties into arguments
of the port-handle link, and retting rid of the child nodes of
the ppe, like:

+       ppe: ppe@28c0000 {
+               compatible = "hisilicon,hip04-ppe", "syscon";
+               reg = <0x28c0000 0x10000>;
+       };
+
+       fe: ethernet@28b0000 {
+               compatible = "hisilicon,hip04-mac";
+               reg = <0x28b0000 0x10000>;
+               interrupts = <0 413 4>;
+               phy-mode = "mii";
+               port-handle = <&ppe 0xf1 0>;
+       };


	Arnd
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 148+ messages in thread

* [PATCH 3/3] net: hisilicon: new hip04 ethernet driver
@ 2014-03-24 15:18       ` Arnd Bergmann
  0 siblings, 0 replies; 148+ messages in thread
From: Arnd Bergmann @ 2014-03-24 15:18 UTC (permalink / raw)
  To: linux-arm-kernel

On Monday 24 March 2014 22:14:56 Zhangfei Gao wrote:

> +
> +static void hip04_tx_reclaim(struct net_device *ndev, bool force)
> +{
> +	struct hip04_priv *priv = netdev_priv(ndev);
> +	unsigned tx_head = priv->tx_head;
> +	unsigned tx_tail = priv->tx_tail;
> +	struct tx_desc *desc = &priv->tx_desc[priv->tx_tail];
> +
> +	while (tx_tail != tx_head) {
> +		if (desc->send_addr != 0) {
> +			if (force)
> +				desc->send_addr = 0;
> +			else
> +				break;
> +		}
> +		if (priv->tx_phys[tx_tail]) {
> +			dma_unmap_single(&ndev->dev, priv->tx_phys[tx_tail],
> +				priv->tx_skb[tx_tail]->len, DMA_TO_DEVICE);
> +			priv->tx_phys[tx_tail] = 0;
> +		}
> +		dev_kfree_skb_irq(priv->tx_skb[tx_tail]);
> +		priv->tx_skb[tx_tail] = NULL;
> +		tx_tail = TX_NEXT(tx_tail);
> +		priv->tx_count--;
> +	}
> +	priv->tx_tail = tx_tail;
> +}

I think you still need to find a solution to ensure that the tx reclaim is
called eventually through a method other than start_xmit.

> +
> +	priv->map = syscon_regmap_lookup_by_compatible("hisilicon,hip04-ppe");
> +	if (IS_ERR(priv->map)) {
> +		dev_warn(d, "no hisilicon,hip04-ppe\n");
> +		ret = PTR_ERR(priv->map);
> +		goto init_fail;
> +	}
> +
> +	n = of_parse_phandle(node, "port-handle", 0);
> +	if (n) {
> +		ret = of_property_read_u32(n, "reg", &priv->port);
> +		if (ret) {
> +			dev_warn(d, "no reg info\n");
> +			goto init_fail;
> +		}
> +
> +		ret = of_property_read_u32(n, "channel", &priv->chan);
> +		if (ret) {
> +			dev_warn(d, "no channel info\n");
> +			goto init_fail;
> +		}
> +	} else {
> +		dev_warn(d, "no port-handle\n");
> +		ret = -EINVAL;
> +		goto init_fail;
> +	}

Doing the lookup by "compatible" string doesn't really help you
at solve the problem of single-instance ppe at all, because that
function will only ever look at the first one. Use
syscon_regmap_lookup_by_phandle instead and pass the phandle
you get from the "port-handle" property.

Also, since you decided to treat the ppe as a dump regmap, I would
recommend moving the 'reg' and 'channel' properties into arguments
of the port-handle link, and retting rid of the child nodes of
the ppe, like:

+       ppe: ppe at 28c0000 {
+               compatible = "hisilicon,hip04-ppe", "syscon";
+               reg = <0x28c0000 0x10000>;
+       };
+
+       fe: ethernet at 28b0000 {
+               compatible = "hisilicon,hip04-mac";
+               reg = <0x28b0000 0x10000>;
+               interrupts = <0 413 4>;
+               phy-mode = "mii";
+               port-handle = <&ppe 0xf1 0>;
+       };


	Arnd

^ permalink raw reply	[flat|nested] 148+ messages in thread

* [PATCH 3/3] net: hisilicon: new hip04 ethernet driver
  2014-03-24 14:14 [PATCH v3 0/3] add hisilicon " Zhangfei Gao
@ 2014-03-24 14:14   ` Zhangfei Gao
  0 siblings, 0 replies; 148+ messages in thread
From: Zhangfei Gao @ 2014-03-24 14:14 UTC (permalink / raw)
  To: davem, linux, arnd, f.fainelli, sergei.shtylyov, mark.rutland
  Cc: linux-arm-kernel, netdev, devicetree, Zhangfei Gao

Support Hisilicon hip04 ethernet driver, including 100M / 1000M controller

Signed-off-by: Zhangfei Gao <zhangfei.gao@linaro.org>
---
 drivers/net/ethernet/hisilicon/Makefile    |    2 +-
 drivers/net/ethernet/hisilicon/hip04_eth.c |  728 ++++++++++++++++++++++++++++
 2 files changed, 729 insertions(+), 1 deletion(-)
 create mode 100644 drivers/net/ethernet/hisilicon/hip04_eth.c

diff --git a/drivers/net/ethernet/hisilicon/Makefile b/drivers/net/ethernet/hisilicon/Makefile
index 1d6eb6e..e6fe7af 100644
--- a/drivers/net/ethernet/hisilicon/Makefile
+++ b/drivers/net/ethernet/hisilicon/Makefile
@@ -2,4 +2,4 @@
 # Makefile for the HISILICON network device drivers.
 #
 
-obj-$(CONFIG_HIP04_ETH) += hip04_mdio.o
+obj-$(CONFIG_HIP04_ETH) += hip04_eth.o hip04_mdio.o
diff --git a/drivers/net/ethernet/hisilicon/hip04_eth.c b/drivers/net/ethernet/hisilicon/hip04_eth.c
new file mode 100644
index 0000000..c1131b2
--- /dev/null
+++ b/drivers/net/ethernet/hisilicon/hip04_eth.c
@@ -0,0 +1,728 @@
+
+/* Copyright (c) 2014 Linaro Ltd.
+ * Copyright (c) 2014 Hisilicon Limited.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ */
+
+#include <linux/module.h>
+#include <linux/etherdevice.h>
+#include <linux/platform_device.h>
+#include <linux/interrupt.h>
+#include <linux/of_address.h>
+#include <linux/phy.h>
+#include <linux/of_mdio.h>
+#include <linux/of_net.h>
+#include <linux/mfd/syscon.h>
+#include <linux/regmap.h>
+
+#define PPE_CFG_RX_ADDR			0x100
+#define PPE_CFG_POOL_GRP		0x300
+#define PPE_CFG_RX_BUF_SIZE		0x400
+#define PPE_CFG_RX_FIFO_SIZE		0x500
+#define PPE_CURR_BUF_CNT		0xa200
+
+#define GE_DUPLEX_TYPE			0x8
+#define GE_MAX_FRM_SIZE_REG		0x3c
+#define GE_PORT_MODE			0x40
+#define GE_PORT_EN			0x44
+#define GE_SHORT_RUNTS_THR_REG		0x50
+#define GE_TX_LOCAL_PAGE_REG		0x5c
+#define GE_TRANSMIT_CONTROL_REG		0x60
+#define GE_CF_CRC_STRIP_REG		0x1b0
+#define GE_MODE_CHANGE_EN		0x1b4
+#define GE_RECV_CONTROL_REG		0x1e0
+#define GE_STATION_MAC_ADDRESS		0x210
+#define PPE_CFG_TX_PKT_BD_ADDR		0x420
+#define PPE_CFG_MAX_FRAME_LEN_REG	0x408
+#define PPE_CFG_BUS_CTRL_REG		0x424
+#define PPE_CFG_RX_CTRL_REG		0x428
+#define PPE_CFG_RX_PKT_MODE_REG		0x438
+#define PPE_CFG_QOS_VMID_GEN		0x500
+#define PPE_CFG_RX_PKT_INT		0x538
+#define PPE_INTEN			0x600
+#define PPE_INTSTS			0x608
+#define PPE_RINT			0x604
+#define PPE_CFG_STS_MODE		0x700
+#define PPE_HIS_RX_PKT_CNT		0x804
+
+/* REG_INTERRUPT */
+#define RCV_INT				BIT(10)
+#define RCV_NOBUF			BIT(8)
+#define DEF_INT_MASK			0x41fdf
+
+#define RX_DESC_NUM			64
+#define TX_DESC_NUM			64
+#define TX_NEXT(N)			(((N) + 1) & (TX_DESC_NUM-1))
+#define RX_NEXT(N)			(((N) + 1) & (RX_DESC_NUM-1))
+
+#define GMAC_PPE_RX_PKT_MAX_LEN		379
+#define GMAC_MAX_PKT_LEN		1516
+#define DESC_DEF_CFG			0x14
+#define RX_BUF_SIZE			1600
+#define RX_PKT_ERR			0x3
+#define TX_TIMEOUT			(6 * HZ)
+
+#define DRV_NAME			"hip04-ether"
+
+struct tx_desc {
+	u32 send_addr;
+	u16 reserved_16;
+	u16 send_size;
+	u32 reserved_32;
+	u32 cfg;
+	u32 wb_addr;
+} ____cacheline_aligned;
+
+struct rx_desc {
+	u16 reserved_16;
+	u16 pkt_len;
+	u32 reserve1[3];
+	u32 pkt_err;
+	u32 reserve2[4];
+};
+
+struct hip04_priv {
+	void __iomem *base;
+	int phy_mode;
+	int chan;
+	unsigned int port;
+	unsigned int speed;
+	unsigned int duplex;
+	unsigned int reg_inten;
+
+	struct napi_struct napi;
+	struct net_device *ndev;
+
+	struct tx_desc *tx_desc;
+	dma_addr_t tx_desc_dma;
+	struct sk_buff *tx_skb[TX_DESC_NUM];
+	dma_addr_t tx_phys[TX_DESC_NUM];
+	unsigned int tx_head;
+	unsigned int tx_tail;
+	unsigned int tx_count;
+
+	unsigned char *rx_buf[RX_DESC_NUM];
+	dma_addr_t rx_phys[RX_DESC_NUM];
+	unsigned int rx_head;
+	unsigned int rx_buf_size;
+
+	struct device_node *phy_node;
+	struct phy_device *phy;
+	struct regmap *map;
+};
+
+static void hip04_config_port(struct hip04_priv *priv, u32 speed, u32 duplex)
+{
+	u32 val;
+
+	priv->speed = speed;
+	priv->duplex = duplex;
+
+	switch (priv->phy_mode) {
+	case PHY_INTERFACE_MODE_SGMII:
+		if (speed == SPEED_1000)
+			val = 8;
+		else
+			val = 7;
+		break;
+	case PHY_INTERFACE_MODE_MII:
+		val = 1;	/* SPEED_100 */
+		break;
+	default:
+		val = 0;
+		break;
+	}
+	writel_relaxed(val, priv->base + GE_PORT_MODE);
+
+	val = (duplex) ? BIT(0) : 0;
+	writel_relaxed(val, priv->base + GE_DUPLEX_TYPE);
+
+	val = BIT(0);
+	writel_relaxed(val, priv->base + GE_MODE_CHANGE_EN);
+}
+
+static void hip04_reset_ppe(struct hip04_priv *priv)
+{
+	u32 val, tmp;
+
+	do {
+		regmap_read(priv->map, priv->port * 4 + PPE_CURR_BUF_CNT, &val);
+		regmap_read(priv->map, priv->port * 4 + PPE_CFG_RX_ADDR, &tmp);
+	} while (val & 0xfff);
+}
+
+static void hip04_config_fifo(struct hip04_priv *priv)
+{
+	u32 val;
+
+	val = readl_relaxed(priv->base + PPE_CFG_STS_MODE);
+	val |= BIT(12);			/* PPE_HIS_RX_PKT_CNT read clear */
+	writel_relaxed(val, priv->base + PPE_CFG_STS_MODE);
+
+	val = BIT(priv->port);
+	regmap_write(priv->map, priv->port * 4 + PPE_CFG_POOL_GRP, val);
+
+	val = priv->port << 8;
+	val |= BIT(14);
+	writel_relaxed(val, priv->base + PPE_CFG_QOS_VMID_GEN);
+
+	val = RX_BUF_SIZE;
+	regmap_write(priv->map, priv->port * 4 + PPE_CFG_RX_BUF_SIZE, val);
+
+	val = RX_DESC_NUM << 16;	/* depth */
+	val |= BIT(11);			/* seq: first set first ues */
+	val |= RX_DESC_NUM * priv->chan;	/* start_addr */
+	regmap_write(priv->map, priv->port * 4 + PPE_CFG_RX_FIFO_SIZE, val);
+
+	/* pkt store format */
+	val = NET_IP_ALIGN << 11;	/* align */
+	writel_relaxed(val, priv->base + PPE_CFG_RX_CTRL_REG);
+
+	/* following cfg required for 1000M */
+	/* pkt mode */
+	val = BIT(18);			/* align */
+	writel_relaxed(val, priv->base + PPE_CFG_RX_PKT_MODE_REG);
+
+	/* set bus ctrl */
+	val = BIT(14);			/* buffer locally release */
+	val |= BIT(0);			/* big endian */
+	writel_relaxed(val, priv->base + PPE_CFG_BUS_CTRL_REG);
+
+	/* set max pkt len, curtail if exceed */
+	val = GMAC_PPE_RX_PKT_MAX_LEN;	/* max buffer len */
+	writel_relaxed(val, priv->base + PPE_CFG_MAX_FRAME_LEN_REG);
+
+	/* set max len of each pkt */
+	val = GMAC_MAX_PKT_LEN;		/* max buffer len */
+	writel_relaxed(val, priv->base + GE_MAX_FRM_SIZE_REG);
+
+	/* set min len of each pkt */
+	val = 31;			/* min buffer len */
+	writel_relaxed(val, priv->base + GE_SHORT_RUNTS_THR_REG);
+
+	/* tx */
+	val = readl_relaxed(priv->base + GE_TRANSMIT_CONTROL_REG);
+	val |= BIT(5);			/* tx auto neg */
+	val |= BIT(6);			/* tx add crc */
+	val |= BIT(7);			/* tx short pad through */
+	writel_relaxed(val, priv->base + GE_TRANSMIT_CONTROL_REG);
+
+	/* rx crc */
+	val = BIT(0);			/* rx strip crc */
+	writel_relaxed(val, priv->base + GE_CF_CRC_STRIP_REG);
+
+	/* rx */
+	val = readl_relaxed(priv->base + GE_RECV_CONTROL_REG);
+	val |= BIT(3);			/* rx strip pad */
+	val |= BIT(4);			/* run pkt en */
+	writel_relaxed(val, priv->base + GE_RECV_CONTROL_REG);
+
+	/* auto neg control */
+	val = BIT(0);
+	writel_relaxed(val, priv->base + GE_TX_LOCAL_PAGE_REG);
+}
+
+static void hip04_mac_enable(struct net_device *ndev, bool enable)
+{
+	struct hip04_priv *priv = netdev_priv(ndev);
+	u32 val;
+
+	if (enable) {
+		/* enable tx & rx */
+		val = readl_relaxed(priv->base + GE_PORT_EN);
+		val |= BIT(1);		/* rx*/
+		val |= BIT(2);		/* tx*/
+		writel_relaxed(val, priv->base + GE_PORT_EN);
+
+		/* enable interrupt */
+		priv->reg_inten = DEF_INT_MASK;
+		writel_relaxed(priv->reg_inten, priv->base + PPE_INTEN);
+
+		/* clear rx int */
+		val = RCV_INT;
+		writel_relaxed(val, priv->base + PPE_RINT);
+
+		/* config recv int*/
+		val = BIT(6);		/* int threshold 1 package */
+		val |= 0x4;		/* recv timeout */
+		writel_relaxed(val, priv->base + PPE_CFG_RX_PKT_INT);
+	} else {
+		/* disable int */
+		priv->reg_inten &= ~(RCV_INT | RCV_NOBUF);
+		writel_relaxed(priv->reg_inten, priv->base + PPE_INTEN);
+
+		/* disable tx & rx */
+		val = readl_relaxed(priv->base + GE_PORT_EN);
+		val &= ~(BIT(1));	/* rx*/
+		val &= ~(BIT(2));	/* tx*/
+		writel_relaxed(val, priv->base + GE_PORT_EN);
+	}
+}
+
+static void hip04_set_xmit_desc(struct hip04_priv *priv, dma_addr_t phys)
+{
+	writel(phys, priv->base + PPE_CFG_TX_PKT_BD_ADDR);
+}
+
+static void hip04_set_recv_desc(struct hip04_priv *priv, dma_addr_t phys)
+{
+	regmap_write(priv->map, priv->port * 4 + PPE_CFG_RX_ADDR, phys);
+}
+
+static u32 hip04_recv_cnt(struct hip04_priv *priv)
+{
+	return readl(priv->base + PPE_HIS_RX_PKT_CNT);
+}
+
+static void hip04_update_mac_address(struct net_device *ndev)
+{
+	struct hip04_priv *priv = netdev_priv(ndev);
+
+	writel_relaxed(((ndev->dev_addr[0] << 8) | (ndev->dev_addr[1])),
+			priv->base + GE_STATION_MAC_ADDRESS);
+	writel_relaxed(((ndev->dev_addr[2] << 24) | (ndev->dev_addr[3] << 16) |
+			(ndev->dev_addr[4] << 8) | (ndev->dev_addr[5])),
+			priv->base + GE_STATION_MAC_ADDRESS + 4);
+}
+
+static int hip04_set_mac_address(struct net_device *ndev, void *addr)
+{
+	eth_mac_addr(ndev, addr);
+	hip04_update_mac_address(ndev);
+	return 0;
+}
+
+static int hip04_rx_poll(struct napi_struct *napi, int budget)
+{
+	struct hip04_priv *priv = container_of(napi, struct hip04_priv, napi);
+	struct net_device *ndev = priv->ndev;
+	struct net_device_stats *stats = &ndev->stats;
+	unsigned int cnt = hip04_recv_cnt(priv);
+	struct sk_buff *skb;
+	struct rx_desc *desc;
+	unsigned char *buf;
+	dma_addr_t phys;
+	int rx = 0;
+	u16 len;
+	u32 err;
+
+	while (cnt) {
+		buf = priv->rx_buf[priv->rx_head];
+		skb = build_skb(buf, priv->rx_buf_size);
+		if (unlikely(!skb))
+			net_dbg_ratelimited("build_skb failed\n");
+
+		dma_unmap_single(&ndev->dev, priv->rx_phys[priv->rx_head],
+				RX_BUF_SIZE, DMA_FROM_DEVICE);
+		priv->rx_phys[priv->rx_head] = 0;
+
+		desc = (struct rx_desc *)skb->data;
+		len = be16_to_cpu(desc->pkt_len);
+		err = be32_to_cpu(desc->pkt_err);
+
+		if (len > RX_BUF_SIZE)
+			len = RX_BUF_SIZE;
+		if (0 == len)
+			break;
+
+		if (err & RX_PKT_ERR) {
+			dev_kfree_skb_any(skb);
+			stats->rx_dropped++;
+			stats->rx_errors++;
+			continue;
+		}
+
+		stats->rx_packets++;
+		stats->rx_bytes += len;
+
+		skb_reserve(skb, NET_SKB_PAD + NET_IP_ALIGN);
+		skb_put(skb, len);
+		skb->protocol = eth_type_trans(skb, ndev);
+		napi_gro_receive(&priv->napi, skb);
+
+		buf = netdev_alloc_frag(priv->rx_buf_size);
+		if (!buf)
+			return -ENOMEM;
+		phys = dma_map_single(&ndev->dev, buf,
+				RX_BUF_SIZE, DMA_FROM_DEVICE);
+		priv->rx_buf[priv->rx_head] = buf;
+		priv->rx_phys[priv->rx_head] = phys;
+		hip04_set_recv_desc(priv, phys);
+
+		priv->rx_head = RX_NEXT(priv->rx_head);
+		if (rx++ >= budget)
+			break;
+
+		if (--cnt == 0)
+			cnt = hip04_recv_cnt(priv);
+	}
+
+	if (rx < budget) {
+		napi_complete(napi);
+
+		/* enable rx interrupt */
+		priv->reg_inten |= RCV_INT | RCV_NOBUF;
+		writel_relaxed(priv->reg_inten, priv->base + PPE_INTEN);
+	}
+
+	return rx;
+}
+
+static irqreturn_t hip04_mac_interrupt(int irq, void *dev_id)
+{
+	struct net_device *ndev = (struct net_device *) dev_id;
+	struct hip04_priv *priv = netdev_priv(ndev);
+	u32 ists = readl_relaxed(priv->base + PPE_INTSTS);
+	u32 val = DEF_INT_MASK;
+
+	writel_relaxed(val, priv->base + PPE_RINT);
+
+	if (ists & (RCV_INT | RCV_NOBUF)) {
+		if (napi_schedule_prep(&priv->napi)) {
+			/* disable rx interrupt */
+			priv->reg_inten &= ~(RCV_INT | RCV_NOBUF);
+			writel_relaxed(priv->reg_inten, priv->base + PPE_INTEN);
+			__napi_schedule(&priv->napi);
+		}
+	}
+
+	return IRQ_HANDLED;
+}
+
+static void hip04_tx_reclaim(struct net_device *ndev, bool force)
+{
+	struct hip04_priv *priv = netdev_priv(ndev);
+	unsigned tx_head = priv->tx_head;
+	unsigned tx_tail = priv->tx_tail;
+	struct tx_desc *desc = &priv->tx_desc[priv->tx_tail];
+
+	while (tx_tail != tx_head) {
+		if (desc->send_addr != 0) {
+			if (force)
+				desc->send_addr = 0;
+			else
+				break;
+		}
+		if (priv->tx_phys[tx_tail]) {
+			dma_unmap_single(&ndev->dev, priv->tx_phys[tx_tail],
+				priv->tx_skb[tx_tail]->len, DMA_TO_DEVICE);
+			priv->tx_phys[tx_tail] = 0;
+		}
+		dev_kfree_skb_irq(priv->tx_skb[tx_tail]);
+		priv->tx_skb[tx_tail] = NULL;
+		tx_tail = TX_NEXT(tx_tail);
+		priv->tx_count--;
+	}
+	priv->tx_tail = tx_tail;
+}
+
+static int hip04_mac_start_xmit(struct sk_buff *skb, struct net_device *ndev)
+{
+	struct hip04_priv *priv = netdev_priv(ndev);
+	struct net_device_stats *stats = &ndev->stats;
+	unsigned int tx_head = priv->tx_head;
+	struct tx_desc *desc = &priv->tx_desc[tx_head];
+	dma_addr_t phys;
+
+	hip04_tx_reclaim(ndev, false);
+
+	if (priv->tx_count++ >= TX_DESC_NUM) {
+		net_dbg_ratelimited("no TX space for packet\n");
+		netif_stop_queue(ndev);
+		return NETDEV_TX_BUSY;
+	}
+
+	phys = dma_map_single(&ndev->dev, skb->data, skb->len, DMA_TO_DEVICE);
+	priv->tx_skb[tx_head] = skb;
+	priv->tx_phys[tx_head] = phys;
+	desc->send_addr = cpu_to_be32(phys);
+	desc->send_size = cpu_to_be16(skb->len);
+	desc->cfg = cpu_to_be32(DESC_DEF_CFG);
+	phys = priv->tx_desc_dma + tx_head * sizeof(struct tx_desc);
+	desc->wb_addr = cpu_to_be32(phys);
+	skb_tx_timestamp(skb);
+	hip04_set_xmit_desc(priv, phys);
+	priv->tx_head = TX_NEXT(tx_head);
+
+	stats->tx_bytes += skb->len;
+	stats->tx_packets++;
+
+	return NETDEV_TX_OK;
+}
+
+static void hip04_adjust_link(struct net_device *ndev)
+{
+	struct hip04_priv *priv = netdev_priv(ndev);
+	struct phy_device *phy = priv->phy;
+
+	if ((priv->speed != phy->speed) || (priv->duplex != phy->duplex)) {
+		hip04_config_port(priv, phy->speed, phy->duplex);
+		phy_print_status(phy);
+	}
+}
+
+static int hip04_mac_open(struct net_device *ndev)
+{
+	struct hip04_priv *priv = netdev_priv(ndev);
+	int i;
+
+	priv->rx_head = 0;
+	priv->tx_head = 0;
+	priv->tx_tail = 0;
+	priv->tx_count = 0;
+
+	hip04_reset_ppe(priv);
+	for (i = 0; i < RX_DESC_NUM; i++) {
+		dma_addr_t phys;
+
+		phys = dma_map_single(&ndev->dev, priv->rx_buf[i],
+				RX_BUF_SIZE, DMA_FROM_DEVICE);
+		priv->rx_phys[i] = phys;
+		hip04_set_recv_desc(priv, phys);
+	}
+
+	if (priv->phy_node) {
+		priv->phy = of_phy_connect(ndev, priv->phy_node,
+			&hip04_adjust_link, 0, priv->phy_mode);
+		if (!priv->phy)
+			return -ENODEV;
+		phy_start(priv->phy);
+	}
+
+	netif_start_queue(ndev);
+	hip04_mac_enable(ndev, true);
+	napi_enable(&priv->napi);
+
+	return 0;
+}
+
+static int hip04_mac_stop(struct net_device *ndev)
+{
+	struct hip04_priv *priv = netdev_priv(ndev);
+	int i;
+
+	if (priv->phy)
+		phy_disconnect(priv->phy);
+
+	priv->phy = NULL;
+	napi_disable(&priv->napi);
+	netif_stop_queue(ndev);
+	hip04_mac_enable(ndev, false);
+	hip04_tx_reclaim(ndev, true);
+	hip04_reset_ppe(priv);
+
+	for (i = 0; i < RX_DESC_NUM; i++) {
+		if (priv->rx_phys[i]) {
+			dma_unmap_single(&ndev->dev, priv->rx_phys[i],
+					RX_BUF_SIZE, DMA_FROM_DEVICE);
+			priv->rx_phys[i] = 0;
+		}
+	}
+
+	return 0;
+}
+
+static void hip04_timeout(struct net_device *ndev)
+{
+	netif_wake_queue(ndev);
+	return;
+}
+
+static struct net_device_ops hip04_netdev_ops = {
+	.ndo_open		= hip04_mac_open,
+	.ndo_stop		= hip04_mac_stop,
+	.ndo_start_xmit		= hip04_mac_start_xmit,
+	.ndo_set_mac_address	= hip04_set_mac_address,
+	.ndo_tx_timeout         = hip04_timeout,
+	.ndo_validate_addr	= eth_validate_addr,
+	.ndo_change_mtu		= eth_change_mtu,
+};
+
+static int hip04_alloc_ring(struct net_device *ndev, struct device *d)
+{
+	struct hip04_priv *priv = netdev_priv(ndev);
+	int i;
+
+	priv->tx_desc = dma_alloc_coherent(d,
+			TX_DESC_NUM * sizeof(struct tx_desc),
+			&priv->tx_desc_dma, GFP_KERNEL);
+	if (!priv->tx_desc)
+		return -ENOMEM;
+
+	priv->rx_buf_size = RX_BUF_SIZE +
+			    SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
+	for (i = 0; i < RX_DESC_NUM; i++) {
+		priv->rx_buf[i] = netdev_alloc_frag(priv->rx_buf_size);
+		if (!priv->rx_buf[i])
+			return -ENOMEM;
+	}
+
+	return 0;
+}
+
+static void hip04_free_ring(struct net_device *ndev, struct device *d)
+{
+	struct hip04_priv *priv = netdev_priv(ndev);
+	int i;
+
+	for (i = 0; i < RX_DESC_NUM; i++)
+		if (priv->rx_buf[i])
+			put_page(virt_to_head_page(priv->rx_buf[i]));
+
+	for (i = 0; i < TX_DESC_NUM; i++)
+		if (priv->tx_skb[i])
+			dev_kfree_skb_any(priv->tx_skb[i]);
+
+	dma_free_coherent(d, TX_DESC_NUM * sizeof(struct tx_desc),
+			priv->tx_desc, priv->tx_desc_dma);
+}
+
+static int hip04_mac_probe(struct platform_device *pdev)
+{
+	struct device *d = &pdev->dev;
+	struct device_node *n, *node = d->of_node;
+	struct net_device *ndev;
+	struct hip04_priv *priv;
+	struct resource *res;
+	unsigned int irq;
+	int ret;
+
+	ndev = alloc_etherdev(sizeof(struct hip04_priv));
+	if (!ndev)
+		return -ENOMEM;
+
+	priv = netdev_priv(ndev);
+	priv->ndev = ndev;
+	platform_set_drvdata(pdev, ndev);
+
+	res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
+	priv->base = devm_ioremap_resource(d, res);
+	if (IS_ERR(priv->base)) {
+		ret = PTR_ERR(priv->base);
+		goto init_fail;
+	}
+
+	priv->map = syscon_regmap_lookup_by_compatible("hisilicon,hip04-ppe");
+	if (IS_ERR(priv->map)) {
+		dev_warn(d, "no hisilicon,hip04-ppe\n");
+		ret = PTR_ERR(priv->map);
+		goto init_fail;
+	}
+
+	n = of_parse_phandle(node, "port-handle", 0);
+	if (n) {
+		ret = of_property_read_u32(n, "reg", &priv->port);
+		if (ret) {
+			dev_warn(d, "no reg info\n");
+			goto init_fail;
+		}
+
+		ret = of_property_read_u32(n, "channel", &priv->chan);
+		if (ret) {
+			dev_warn(d, "no channel info\n");
+			goto init_fail;
+		}
+	} else {
+		dev_warn(d, "no port-handle\n");
+		ret = -EINVAL;
+		goto init_fail;
+	}
+
+	priv->phy_mode = of_get_phy_mode(node);
+	if (priv->phy_mode < 0) {
+		dev_warn(d, "not find phy-mode\n");
+		ret = -EINVAL;
+		goto init_fail;
+	}
+
+	irq = platform_get_irq(pdev, 0);
+	if (irq <= 0) {
+		ret = -EINVAL;
+		goto init_fail;
+	}
+
+	ether_setup(ndev);
+	ndev->netdev_ops = &hip04_netdev_ops;
+	ndev->watchdog_timeo = TX_TIMEOUT;
+	ndev->priv_flags |= IFF_UNICAST_FLT;
+	ndev->irq = irq;
+	netif_napi_add(ndev, &priv->napi, hip04_rx_poll, RX_DESC_NUM);
+	SET_NETDEV_DEV(ndev, &pdev->dev);
+
+	hip04_reset_ppe(priv);
+	if (priv->phy_mode == PHY_INTERFACE_MODE_MII)
+		hip04_config_port(priv, SPEED_100, DUPLEX_FULL);
+
+	hip04_config_fifo(priv);
+	random_ether_addr(ndev->dev_addr);
+	hip04_update_mac_address(ndev);
+
+	ret = hip04_alloc_ring(ndev, d);
+	if (ret) {
+		netdev_err(ndev, "alloc ring fail\n");
+		goto alloc_fail;
+	}
+
+	ret = devm_request_irq(d, irq, hip04_mac_interrupt,
+				0, pdev->name, ndev);
+	if (ret) {
+		netdev_err(ndev, "devm_request_irq failed\n");
+		goto alloc_fail;
+	}
+
+	priv->phy_node = of_parse_phandle(node, "phy-handle", 0);
+
+	ret = register_netdev(ndev);
+	if (ret) {
+		free_netdev(ndev);
+		goto alloc_fail;
+	}
+
+	return 0;
+
+alloc_fail:
+	hip04_free_ring(ndev, d);
+init_fail:
+	of_node_put(priv->phy_node);
+	free_netdev(ndev);
+	return ret;
+}
+
+static int hip04_remove(struct platform_device *pdev)
+{
+	struct net_device *ndev = platform_get_drvdata(pdev);
+	struct hip04_priv *priv = netdev_priv(ndev);
+	struct device *d = &pdev->dev;
+
+	hip04_free_ring(ndev, d);
+	unregister_netdev(ndev);
+	free_irq(ndev->irq, ndev);
+	of_node_put(priv->phy_node);
+	free_netdev(ndev);
+
+	return 0;
+}
+
+static const struct of_device_id hip04_mac_match[] = {
+	{ .compatible = "hisilicon,hip04-mac" },
+	{ }
+};
+
+static struct platform_driver hip04_mac_driver = {
+	.probe	= hip04_mac_probe,
+	.remove	= hip04_remove,
+	.driver	= {
+		.name		= DRV_NAME,
+		.owner		= THIS_MODULE,
+		.of_match_table	= hip04_mac_match,
+	},
+};
+module_platform_driver(hip04_mac_driver);
+
+MODULE_DESCRIPTION("HISILICON P04 Ethernet driver");
+MODULE_LICENSE("GPL v2");
+MODULE_ALIAS("platform:hip04-ether");
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 148+ messages in thread

* [PATCH 3/3] net: hisilicon: new hip04 ethernet driver
@ 2014-03-24 14:14   ` Zhangfei Gao
  0 siblings, 0 replies; 148+ messages in thread
From: Zhangfei Gao @ 2014-03-24 14:14 UTC (permalink / raw)
  To: linux-arm-kernel

Support Hisilicon hip04 ethernet driver, including 100M / 1000M controller

Signed-off-by: Zhangfei Gao <zhangfei.gao@linaro.org>
---
 drivers/net/ethernet/hisilicon/Makefile    |    2 +-
 drivers/net/ethernet/hisilicon/hip04_eth.c |  728 ++++++++++++++++++++++++++++
 2 files changed, 729 insertions(+), 1 deletion(-)
 create mode 100644 drivers/net/ethernet/hisilicon/hip04_eth.c

diff --git a/drivers/net/ethernet/hisilicon/Makefile b/drivers/net/ethernet/hisilicon/Makefile
index 1d6eb6e..e6fe7af 100644
--- a/drivers/net/ethernet/hisilicon/Makefile
+++ b/drivers/net/ethernet/hisilicon/Makefile
@@ -2,4 +2,4 @@
 # Makefile for the HISILICON network device drivers.
 #
 
-obj-$(CONFIG_HIP04_ETH) += hip04_mdio.o
+obj-$(CONFIG_HIP04_ETH) += hip04_eth.o hip04_mdio.o
diff --git a/drivers/net/ethernet/hisilicon/hip04_eth.c b/drivers/net/ethernet/hisilicon/hip04_eth.c
new file mode 100644
index 0000000..c1131b2
--- /dev/null
+++ b/drivers/net/ethernet/hisilicon/hip04_eth.c
@@ -0,0 +1,728 @@
+
+/* Copyright (c) 2014 Linaro Ltd.
+ * Copyright (c) 2014 Hisilicon Limited.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ */
+
+#include <linux/module.h>
+#include <linux/etherdevice.h>
+#include <linux/platform_device.h>
+#include <linux/interrupt.h>
+#include <linux/of_address.h>
+#include <linux/phy.h>
+#include <linux/of_mdio.h>
+#include <linux/of_net.h>
+#include <linux/mfd/syscon.h>
+#include <linux/regmap.h>
+
+#define PPE_CFG_RX_ADDR			0x100
+#define PPE_CFG_POOL_GRP		0x300
+#define PPE_CFG_RX_BUF_SIZE		0x400
+#define PPE_CFG_RX_FIFO_SIZE		0x500
+#define PPE_CURR_BUF_CNT		0xa200
+
+#define GE_DUPLEX_TYPE			0x8
+#define GE_MAX_FRM_SIZE_REG		0x3c
+#define GE_PORT_MODE			0x40
+#define GE_PORT_EN			0x44
+#define GE_SHORT_RUNTS_THR_REG		0x50
+#define GE_TX_LOCAL_PAGE_REG		0x5c
+#define GE_TRANSMIT_CONTROL_REG		0x60
+#define GE_CF_CRC_STRIP_REG		0x1b0
+#define GE_MODE_CHANGE_EN		0x1b4
+#define GE_RECV_CONTROL_REG		0x1e0
+#define GE_STATION_MAC_ADDRESS		0x210
+#define PPE_CFG_TX_PKT_BD_ADDR		0x420
+#define PPE_CFG_MAX_FRAME_LEN_REG	0x408
+#define PPE_CFG_BUS_CTRL_REG		0x424
+#define PPE_CFG_RX_CTRL_REG		0x428
+#define PPE_CFG_RX_PKT_MODE_REG		0x438
+#define PPE_CFG_QOS_VMID_GEN		0x500
+#define PPE_CFG_RX_PKT_INT		0x538
+#define PPE_INTEN			0x600
+#define PPE_INTSTS			0x608
+#define PPE_RINT			0x604
+#define PPE_CFG_STS_MODE		0x700
+#define PPE_HIS_RX_PKT_CNT		0x804
+
+/* REG_INTERRUPT */
+#define RCV_INT				BIT(10)
+#define RCV_NOBUF			BIT(8)
+#define DEF_INT_MASK			0x41fdf
+
+#define RX_DESC_NUM			64
+#define TX_DESC_NUM			64
+#define TX_NEXT(N)			(((N) + 1) & (TX_DESC_NUM-1))
+#define RX_NEXT(N)			(((N) + 1) & (RX_DESC_NUM-1))
+
+#define GMAC_PPE_RX_PKT_MAX_LEN		379
+#define GMAC_MAX_PKT_LEN		1516
+#define DESC_DEF_CFG			0x14
+#define RX_BUF_SIZE			1600
+#define RX_PKT_ERR			0x3
+#define TX_TIMEOUT			(6 * HZ)
+
+#define DRV_NAME			"hip04-ether"
+
+struct tx_desc {
+	u32 send_addr;
+	u16 reserved_16;
+	u16 send_size;
+	u32 reserved_32;
+	u32 cfg;
+	u32 wb_addr;
+} ____cacheline_aligned;
+
+struct rx_desc {
+	u16 reserved_16;
+	u16 pkt_len;
+	u32 reserve1[3];
+	u32 pkt_err;
+	u32 reserve2[4];
+};
+
+struct hip04_priv {
+	void __iomem *base;
+	int phy_mode;
+	int chan;
+	unsigned int port;
+	unsigned int speed;
+	unsigned int duplex;
+	unsigned int reg_inten;
+
+	struct napi_struct napi;
+	struct net_device *ndev;
+
+	struct tx_desc *tx_desc;
+	dma_addr_t tx_desc_dma;
+	struct sk_buff *tx_skb[TX_DESC_NUM];
+	dma_addr_t tx_phys[TX_DESC_NUM];
+	unsigned int tx_head;
+	unsigned int tx_tail;
+	unsigned int tx_count;
+
+	unsigned char *rx_buf[RX_DESC_NUM];
+	dma_addr_t rx_phys[RX_DESC_NUM];
+	unsigned int rx_head;
+	unsigned int rx_buf_size;
+
+	struct device_node *phy_node;
+	struct phy_device *phy;
+	struct regmap *map;
+};
+
+static void hip04_config_port(struct hip04_priv *priv, u32 speed, u32 duplex)
+{
+	u32 val;
+
+	priv->speed = speed;
+	priv->duplex = duplex;
+
+	switch (priv->phy_mode) {
+	case PHY_INTERFACE_MODE_SGMII:
+		if (speed == SPEED_1000)
+			val = 8;
+		else
+			val = 7;
+		break;
+	case PHY_INTERFACE_MODE_MII:
+		val = 1;	/* SPEED_100 */
+		break;
+	default:
+		val = 0;
+		break;
+	}
+	writel_relaxed(val, priv->base + GE_PORT_MODE);
+
+	val = (duplex) ? BIT(0) : 0;
+	writel_relaxed(val, priv->base + GE_DUPLEX_TYPE);
+
+	val = BIT(0);
+	writel_relaxed(val, priv->base + GE_MODE_CHANGE_EN);
+}
+
+static void hip04_reset_ppe(struct hip04_priv *priv)
+{
+	u32 val, tmp;
+
+	do {
+		regmap_read(priv->map, priv->port * 4 + PPE_CURR_BUF_CNT, &val);
+		regmap_read(priv->map, priv->port * 4 + PPE_CFG_RX_ADDR, &tmp);
+	} while (val & 0xfff);
+}
+
+static void hip04_config_fifo(struct hip04_priv *priv)
+{
+	u32 val;
+
+	val = readl_relaxed(priv->base + PPE_CFG_STS_MODE);
+	val |= BIT(12);			/* PPE_HIS_RX_PKT_CNT read clear */
+	writel_relaxed(val, priv->base + PPE_CFG_STS_MODE);
+
+	val = BIT(priv->port);
+	regmap_write(priv->map, priv->port * 4 + PPE_CFG_POOL_GRP, val);
+
+	val = priv->port << 8;
+	val |= BIT(14);
+	writel_relaxed(val, priv->base + PPE_CFG_QOS_VMID_GEN);
+
+	val = RX_BUF_SIZE;
+	regmap_write(priv->map, priv->port * 4 + PPE_CFG_RX_BUF_SIZE, val);
+
+	val = RX_DESC_NUM << 16;	/* depth */
+	val |= BIT(11);			/* seq: first set first ues */
+	val |= RX_DESC_NUM * priv->chan;	/* start_addr */
+	regmap_write(priv->map, priv->port * 4 + PPE_CFG_RX_FIFO_SIZE, val);
+
+	/* pkt store format */
+	val = NET_IP_ALIGN << 11;	/* align */
+	writel_relaxed(val, priv->base + PPE_CFG_RX_CTRL_REG);
+
+	/* following cfg required for 1000M */
+	/* pkt mode */
+	val = BIT(18);			/* align */
+	writel_relaxed(val, priv->base + PPE_CFG_RX_PKT_MODE_REG);
+
+	/* set bus ctrl */
+	val = BIT(14);			/* buffer locally release */
+	val |= BIT(0);			/* big endian */
+	writel_relaxed(val, priv->base + PPE_CFG_BUS_CTRL_REG);
+
+	/* set max pkt len, curtail if exceed */
+	val = GMAC_PPE_RX_PKT_MAX_LEN;	/* max buffer len */
+	writel_relaxed(val, priv->base + PPE_CFG_MAX_FRAME_LEN_REG);
+
+	/* set max len of each pkt */
+	val = GMAC_MAX_PKT_LEN;		/* max buffer len */
+	writel_relaxed(val, priv->base + GE_MAX_FRM_SIZE_REG);
+
+	/* set min len of each pkt */
+	val = 31;			/* min buffer len */
+	writel_relaxed(val, priv->base + GE_SHORT_RUNTS_THR_REG);
+
+	/* tx */
+	val = readl_relaxed(priv->base + GE_TRANSMIT_CONTROL_REG);
+	val |= BIT(5);			/* tx auto neg */
+	val |= BIT(6);			/* tx add crc */
+	val |= BIT(7);			/* tx short pad through */
+	writel_relaxed(val, priv->base + GE_TRANSMIT_CONTROL_REG);
+
+	/* rx crc */
+	val = BIT(0);			/* rx strip crc */
+	writel_relaxed(val, priv->base + GE_CF_CRC_STRIP_REG);
+
+	/* rx */
+	val = readl_relaxed(priv->base + GE_RECV_CONTROL_REG);
+	val |= BIT(3);			/* rx strip pad */
+	val |= BIT(4);			/* run pkt en */
+	writel_relaxed(val, priv->base + GE_RECV_CONTROL_REG);
+
+	/* auto neg control */
+	val = BIT(0);
+	writel_relaxed(val, priv->base + GE_TX_LOCAL_PAGE_REG);
+}
+
+static void hip04_mac_enable(struct net_device *ndev, bool enable)
+{
+	struct hip04_priv *priv = netdev_priv(ndev);
+	u32 val;
+
+	if (enable) {
+		/* enable tx & rx */
+		val = readl_relaxed(priv->base + GE_PORT_EN);
+		val |= BIT(1);		/* rx*/
+		val |= BIT(2);		/* tx*/
+		writel_relaxed(val, priv->base + GE_PORT_EN);
+
+		/* enable interrupt */
+		priv->reg_inten = DEF_INT_MASK;
+		writel_relaxed(priv->reg_inten, priv->base + PPE_INTEN);
+
+		/* clear rx int */
+		val = RCV_INT;
+		writel_relaxed(val, priv->base + PPE_RINT);
+
+		/* config recv int*/
+		val = BIT(6);		/* int threshold 1 package */
+		val |= 0x4;		/* recv timeout */
+		writel_relaxed(val, priv->base + PPE_CFG_RX_PKT_INT);
+	} else {
+		/* disable int */
+		priv->reg_inten &= ~(RCV_INT | RCV_NOBUF);
+		writel_relaxed(priv->reg_inten, priv->base + PPE_INTEN);
+
+		/* disable tx & rx */
+		val = readl_relaxed(priv->base + GE_PORT_EN);
+		val &= ~(BIT(1));	/* rx*/
+		val &= ~(BIT(2));	/* tx*/
+		writel_relaxed(val, priv->base + GE_PORT_EN);
+	}
+}
+
+static void hip04_set_xmit_desc(struct hip04_priv *priv, dma_addr_t phys)
+{
+	writel(phys, priv->base + PPE_CFG_TX_PKT_BD_ADDR);
+}
+
+static void hip04_set_recv_desc(struct hip04_priv *priv, dma_addr_t phys)
+{
+	regmap_write(priv->map, priv->port * 4 + PPE_CFG_RX_ADDR, phys);
+}
+
+static u32 hip04_recv_cnt(struct hip04_priv *priv)
+{
+	return readl(priv->base + PPE_HIS_RX_PKT_CNT);
+}
+
+static void hip04_update_mac_address(struct net_device *ndev)
+{
+	struct hip04_priv *priv = netdev_priv(ndev);
+
+	writel_relaxed(((ndev->dev_addr[0] << 8) | (ndev->dev_addr[1])),
+			priv->base + GE_STATION_MAC_ADDRESS);
+	writel_relaxed(((ndev->dev_addr[2] << 24) | (ndev->dev_addr[3] << 16) |
+			(ndev->dev_addr[4] << 8) | (ndev->dev_addr[5])),
+			priv->base + GE_STATION_MAC_ADDRESS + 4);
+}
+
+static int hip04_set_mac_address(struct net_device *ndev, void *addr)
+{
+	eth_mac_addr(ndev, addr);
+	hip04_update_mac_address(ndev);
+	return 0;
+}
+
+static int hip04_rx_poll(struct napi_struct *napi, int budget)
+{
+	struct hip04_priv *priv = container_of(napi, struct hip04_priv, napi);
+	struct net_device *ndev = priv->ndev;
+	struct net_device_stats *stats = &ndev->stats;
+	unsigned int cnt = hip04_recv_cnt(priv);
+	struct sk_buff *skb;
+	struct rx_desc *desc;
+	unsigned char *buf;
+	dma_addr_t phys;
+	int rx = 0;
+	u16 len;
+	u32 err;
+
+	while (cnt) {
+		buf = priv->rx_buf[priv->rx_head];
+		skb = build_skb(buf, priv->rx_buf_size);
+		if (unlikely(!skb))
+			net_dbg_ratelimited("build_skb failed\n");
+
+		dma_unmap_single(&ndev->dev, priv->rx_phys[priv->rx_head],
+				RX_BUF_SIZE, DMA_FROM_DEVICE);
+		priv->rx_phys[priv->rx_head] = 0;
+
+		desc = (struct rx_desc *)skb->data;
+		len = be16_to_cpu(desc->pkt_len);
+		err = be32_to_cpu(desc->pkt_err);
+
+		if (len > RX_BUF_SIZE)
+			len = RX_BUF_SIZE;
+		if (0 == len)
+			break;
+
+		if (err & RX_PKT_ERR) {
+			dev_kfree_skb_any(skb);
+			stats->rx_dropped++;
+			stats->rx_errors++;
+			continue;
+		}
+
+		stats->rx_packets++;
+		stats->rx_bytes += len;
+
+		skb_reserve(skb, NET_SKB_PAD + NET_IP_ALIGN);
+		skb_put(skb, len);
+		skb->protocol = eth_type_trans(skb, ndev);
+		napi_gro_receive(&priv->napi, skb);
+
+		buf = netdev_alloc_frag(priv->rx_buf_size);
+		if (!buf)
+			return -ENOMEM;
+		phys = dma_map_single(&ndev->dev, buf,
+				RX_BUF_SIZE, DMA_FROM_DEVICE);
+		priv->rx_buf[priv->rx_head] = buf;
+		priv->rx_phys[priv->rx_head] = phys;
+		hip04_set_recv_desc(priv, phys);
+
+		priv->rx_head = RX_NEXT(priv->rx_head);
+		if (rx++ >= budget)
+			break;
+
+		if (--cnt == 0)
+			cnt = hip04_recv_cnt(priv);
+	}
+
+	if (rx < budget) {
+		napi_complete(napi);
+
+		/* enable rx interrupt */
+		priv->reg_inten |= RCV_INT | RCV_NOBUF;
+		writel_relaxed(priv->reg_inten, priv->base + PPE_INTEN);
+	}
+
+	return rx;
+}
+
+static irqreturn_t hip04_mac_interrupt(int irq, void *dev_id)
+{
+	struct net_device *ndev = (struct net_device *) dev_id;
+	struct hip04_priv *priv = netdev_priv(ndev);
+	u32 ists = readl_relaxed(priv->base + PPE_INTSTS);
+	u32 val = DEF_INT_MASK;
+
+	writel_relaxed(val, priv->base + PPE_RINT);
+
+	if (ists & (RCV_INT | RCV_NOBUF)) {
+		if (napi_schedule_prep(&priv->napi)) {
+			/* disable rx interrupt */
+			priv->reg_inten &= ~(RCV_INT | RCV_NOBUF);
+			writel_relaxed(priv->reg_inten, priv->base + PPE_INTEN);
+			__napi_schedule(&priv->napi);
+		}
+	}
+
+	return IRQ_HANDLED;
+}
+
+static void hip04_tx_reclaim(struct net_device *ndev, bool force)
+{
+	struct hip04_priv *priv = netdev_priv(ndev);
+	unsigned tx_head = priv->tx_head;
+	unsigned tx_tail = priv->tx_tail;
+	struct tx_desc *desc = &priv->tx_desc[priv->tx_tail];
+
+	while (tx_tail != tx_head) {
+		if (desc->send_addr != 0) {
+			if (force)
+				desc->send_addr = 0;
+			else
+				break;
+		}
+		if (priv->tx_phys[tx_tail]) {
+			dma_unmap_single(&ndev->dev, priv->tx_phys[tx_tail],
+				priv->tx_skb[tx_tail]->len, DMA_TO_DEVICE);
+			priv->tx_phys[tx_tail] = 0;
+		}
+		dev_kfree_skb_irq(priv->tx_skb[tx_tail]);
+		priv->tx_skb[tx_tail] = NULL;
+		tx_tail = TX_NEXT(tx_tail);
+		priv->tx_count--;
+	}
+	priv->tx_tail = tx_tail;
+}
+
+static int hip04_mac_start_xmit(struct sk_buff *skb, struct net_device *ndev)
+{
+	struct hip04_priv *priv = netdev_priv(ndev);
+	struct net_device_stats *stats = &ndev->stats;
+	unsigned int tx_head = priv->tx_head;
+	struct tx_desc *desc = &priv->tx_desc[tx_head];
+	dma_addr_t phys;
+
+	hip04_tx_reclaim(ndev, false);
+
+	if (priv->tx_count++ >= TX_DESC_NUM) {
+		net_dbg_ratelimited("no TX space for packet\n");
+		netif_stop_queue(ndev);
+		return NETDEV_TX_BUSY;
+	}
+
+	phys = dma_map_single(&ndev->dev, skb->data, skb->len, DMA_TO_DEVICE);
+	priv->tx_skb[tx_head] = skb;
+	priv->tx_phys[tx_head] = phys;
+	desc->send_addr = cpu_to_be32(phys);
+	desc->send_size = cpu_to_be16(skb->len);
+	desc->cfg = cpu_to_be32(DESC_DEF_CFG);
+	phys = priv->tx_desc_dma + tx_head * sizeof(struct tx_desc);
+	desc->wb_addr = cpu_to_be32(phys);
+	skb_tx_timestamp(skb);
+	hip04_set_xmit_desc(priv, phys);
+	priv->tx_head = TX_NEXT(tx_head);
+
+	stats->tx_bytes += skb->len;
+	stats->tx_packets++;
+
+	return NETDEV_TX_OK;
+}
+
+static void hip04_adjust_link(struct net_device *ndev)
+{
+	struct hip04_priv *priv = netdev_priv(ndev);
+	struct phy_device *phy = priv->phy;
+
+	if ((priv->speed != phy->speed) || (priv->duplex != phy->duplex)) {
+		hip04_config_port(priv, phy->speed, phy->duplex);
+		phy_print_status(phy);
+	}
+}
+
+static int hip04_mac_open(struct net_device *ndev)
+{
+	struct hip04_priv *priv = netdev_priv(ndev);
+	int i;
+
+	priv->rx_head = 0;
+	priv->tx_head = 0;
+	priv->tx_tail = 0;
+	priv->tx_count = 0;
+
+	hip04_reset_ppe(priv);
+	for (i = 0; i < RX_DESC_NUM; i++) {
+		dma_addr_t phys;
+
+		phys = dma_map_single(&ndev->dev, priv->rx_buf[i],
+				RX_BUF_SIZE, DMA_FROM_DEVICE);
+		priv->rx_phys[i] = phys;
+		hip04_set_recv_desc(priv, phys);
+	}
+
+	if (priv->phy_node) {
+		priv->phy = of_phy_connect(ndev, priv->phy_node,
+			&hip04_adjust_link, 0, priv->phy_mode);
+		if (!priv->phy)
+			return -ENODEV;
+		phy_start(priv->phy);
+	}
+
+	netif_start_queue(ndev);
+	hip04_mac_enable(ndev, true);
+	napi_enable(&priv->napi);
+
+	return 0;
+}
+
+static int hip04_mac_stop(struct net_device *ndev)
+{
+	struct hip04_priv *priv = netdev_priv(ndev);
+	int i;
+
+	if (priv->phy)
+		phy_disconnect(priv->phy);
+
+	priv->phy = NULL;
+	napi_disable(&priv->napi);
+	netif_stop_queue(ndev);
+	hip04_mac_enable(ndev, false);
+	hip04_tx_reclaim(ndev, true);
+	hip04_reset_ppe(priv);
+
+	for (i = 0; i < RX_DESC_NUM; i++) {
+		if (priv->rx_phys[i]) {
+			dma_unmap_single(&ndev->dev, priv->rx_phys[i],
+					RX_BUF_SIZE, DMA_FROM_DEVICE);
+			priv->rx_phys[i] = 0;
+		}
+	}
+
+	return 0;
+}
+
+static void hip04_timeout(struct net_device *ndev)
+{
+	netif_wake_queue(ndev);
+	return;
+}
+
+static struct net_device_ops hip04_netdev_ops = {
+	.ndo_open		= hip04_mac_open,
+	.ndo_stop		= hip04_mac_stop,
+	.ndo_start_xmit		= hip04_mac_start_xmit,
+	.ndo_set_mac_address	= hip04_set_mac_address,
+	.ndo_tx_timeout         = hip04_timeout,
+	.ndo_validate_addr	= eth_validate_addr,
+	.ndo_change_mtu		= eth_change_mtu,
+};
+
+static int hip04_alloc_ring(struct net_device *ndev, struct device *d)
+{
+	struct hip04_priv *priv = netdev_priv(ndev);
+	int i;
+
+	priv->tx_desc = dma_alloc_coherent(d,
+			TX_DESC_NUM * sizeof(struct tx_desc),
+			&priv->tx_desc_dma, GFP_KERNEL);
+	if (!priv->tx_desc)
+		return -ENOMEM;
+
+	priv->rx_buf_size = RX_BUF_SIZE +
+			    SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
+	for (i = 0; i < RX_DESC_NUM; i++) {
+		priv->rx_buf[i] = netdev_alloc_frag(priv->rx_buf_size);
+		if (!priv->rx_buf[i])
+			return -ENOMEM;
+	}
+
+	return 0;
+}
+
+static void hip04_free_ring(struct net_device *ndev, struct device *d)
+{
+	struct hip04_priv *priv = netdev_priv(ndev);
+	int i;
+
+	for (i = 0; i < RX_DESC_NUM; i++)
+		if (priv->rx_buf[i])
+			put_page(virt_to_head_page(priv->rx_buf[i]));
+
+	for (i = 0; i < TX_DESC_NUM; i++)
+		if (priv->tx_skb[i])
+			dev_kfree_skb_any(priv->tx_skb[i]);
+
+	dma_free_coherent(d, TX_DESC_NUM * sizeof(struct tx_desc),
+			priv->tx_desc, priv->tx_desc_dma);
+}
+
+static int hip04_mac_probe(struct platform_device *pdev)
+{
+	struct device *d = &pdev->dev;
+	struct device_node *n, *node = d->of_node;
+	struct net_device *ndev;
+	struct hip04_priv *priv;
+	struct resource *res;
+	unsigned int irq;
+	int ret;
+
+	ndev = alloc_etherdev(sizeof(struct hip04_priv));
+	if (!ndev)
+		return -ENOMEM;
+
+	priv = netdev_priv(ndev);
+	priv->ndev = ndev;
+	platform_set_drvdata(pdev, ndev);
+
+	res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
+	priv->base = devm_ioremap_resource(d, res);
+	if (IS_ERR(priv->base)) {
+		ret = PTR_ERR(priv->base);
+		goto init_fail;
+	}
+
+	priv->map = syscon_regmap_lookup_by_compatible("hisilicon,hip04-ppe");
+	if (IS_ERR(priv->map)) {
+		dev_warn(d, "no hisilicon,hip04-ppe\n");
+		ret = PTR_ERR(priv->map);
+		goto init_fail;
+	}
+
+	n = of_parse_phandle(node, "port-handle", 0);
+	if (n) {
+		ret = of_property_read_u32(n, "reg", &priv->port);
+		if (ret) {
+			dev_warn(d, "no reg info\n");
+			goto init_fail;
+		}
+
+		ret = of_property_read_u32(n, "channel", &priv->chan);
+		if (ret) {
+			dev_warn(d, "no channel info\n");
+			goto init_fail;
+		}
+	} else {
+		dev_warn(d, "no port-handle\n");
+		ret = -EINVAL;
+		goto init_fail;
+	}
+
+	priv->phy_mode = of_get_phy_mode(node);
+	if (priv->phy_mode < 0) {
+		dev_warn(d, "not find phy-mode\n");
+		ret = -EINVAL;
+		goto init_fail;
+	}
+
+	irq = platform_get_irq(pdev, 0);
+	if (irq <= 0) {
+		ret = -EINVAL;
+		goto init_fail;
+	}
+
+	ether_setup(ndev);
+	ndev->netdev_ops = &hip04_netdev_ops;
+	ndev->watchdog_timeo = TX_TIMEOUT;
+	ndev->priv_flags |= IFF_UNICAST_FLT;
+	ndev->irq = irq;
+	netif_napi_add(ndev, &priv->napi, hip04_rx_poll, RX_DESC_NUM);
+	SET_NETDEV_DEV(ndev, &pdev->dev);
+
+	hip04_reset_ppe(priv);
+	if (priv->phy_mode == PHY_INTERFACE_MODE_MII)
+		hip04_config_port(priv, SPEED_100, DUPLEX_FULL);
+
+	hip04_config_fifo(priv);
+	random_ether_addr(ndev->dev_addr);
+	hip04_update_mac_address(ndev);
+
+	ret = hip04_alloc_ring(ndev, d);
+	if (ret) {
+		netdev_err(ndev, "alloc ring fail\n");
+		goto alloc_fail;
+	}
+
+	ret = devm_request_irq(d, irq, hip04_mac_interrupt,
+				0, pdev->name, ndev);
+	if (ret) {
+		netdev_err(ndev, "devm_request_irq failed\n");
+		goto alloc_fail;
+	}
+
+	priv->phy_node = of_parse_phandle(node, "phy-handle", 0);
+
+	ret = register_netdev(ndev);
+	if (ret) {
+		free_netdev(ndev);
+		goto alloc_fail;
+	}
+
+	return 0;
+
+alloc_fail:
+	hip04_free_ring(ndev, d);
+init_fail:
+	of_node_put(priv->phy_node);
+	free_netdev(ndev);
+	return ret;
+}
+
+static int hip04_remove(struct platform_device *pdev)
+{
+	struct net_device *ndev = platform_get_drvdata(pdev);
+	struct hip04_priv *priv = netdev_priv(ndev);
+	struct device *d = &pdev->dev;
+
+	hip04_free_ring(ndev, d);
+	unregister_netdev(ndev);
+	free_irq(ndev->irq, ndev);
+	of_node_put(priv->phy_node);
+	free_netdev(ndev);
+
+	return 0;
+}
+
+static const struct of_device_id hip04_mac_match[] = {
+	{ .compatible = "hisilicon,hip04-mac" },
+	{ }
+};
+
+static struct platform_driver hip04_mac_driver = {
+	.probe	= hip04_mac_probe,
+	.remove	= hip04_remove,
+	.driver	= {
+		.name		= DRV_NAME,
+		.owner		= THIS_MODULE,
+		.of_match_table	= hip04_mac_match,
+	},
+};
+module_platform_driver(hip04_mac_driver);
+
+MODULE_DESCRIPTION("HISILICON P04 Ethernet driver");
+MODULE_LICENSE("GPL v2");
+MODULE_ALIAS("platform:hip04-ether");
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 148+ messages in thread

* Re: [PATCH 3/3] net: hisilicon: new hip04 ethernet driver
  2014-03-22  1:18       ` zhangfei
@ 2014-03-22  8:08         ` Arnd Bergmann
  -1 siblings, 0 replies; 148+ messages in thread
From: Arnd Bergmann @ 2014-03-22  8:08 UTC (permalink / raw)
  To: zhangfei
  Cc: David S. Miller, linux, f.fainelli, mark.rutland,
	sergei.shtylyov, linux-arm-kernel, netdev, devicetree

On Saturday 22 March 2014 09:18:35 zhangfei wrote:
> >> +static void hip04_tx_reclaim(struct net_device *ndev, bool force)
> >> +{
> >> +	struct hip04_priv *priv = netdev_priv(ndev);
> >> +	unsigned tx_head = priv->tx_head;
> >> +	unsigned tx_tail = priv->tx_tail;
> >> +	struct tx_desc *desc = &priv->tx_desc[priv->tx_tail];
> >> +
> >> +	while (tx_tail != tx_head) {
> >> +		if (desc->send_addr != 0) {
> >> +			if (force)
> >> +				desc->send_addr = 0;
> >> +			else
> >> +				break;
> >> +		}
> >> +		if (priv->tx_phys[tx_tail]) {
> >> +			dma_unmap_single(&ndev->dev, priv->tx_phys[tx_tail],
> >> +				priv->tx_skb[tx_tail]->len, DMA_TO_DEVICE);
> >> +			priv->tx_phys[tx_tail] = 0;
> >> +		}
> >> +		dev_kfree_skb_irq(priv->tx_skb[tx_tail]);
> >> +		priv->tx_skb[tx_tail] = NULL;
> >> +		tx_tail = TX_NEXT(tx_tail);
> >> +		priv->tx_count--;
> >> +	}
> >> +	priv->tx_tail = tx_tail;
> >> +}
> >
> > You call this function from start_xmit(), which may be too early, causing the
> > dma_unmap_single() and dev_kfree_skb_irq() functions to be called while the
> > device is still accessing the data. This is bad.
> 
> There is a protection.
> Only after xmit done, desc->send_addr = 0, which is cleared by hardware.

Ok, I see.

> > You have to ensure that you only ever clean up tx buffers that have been
> > successfully transmitted. Also, you should use an interrupt to notify you
> > of this in case there is no further xmit packet. Otherwise you may have
> > a user space program waiting indefinitely for a single packet to get sent
> > on a socket.
> >
> > It's ok to also call the cleanup from start_xmit, but calling it from the
> > poll() function or another appropriate place is required.
> 
> There is no transmit done interrupt, so relying on every xmit to reclaim 
> finished buffer.
> I thought it would be enough, since there are TX_DESC_NUM descs.
> It is a good idea also put reclaim in poll, then will add spin lock etc.

It may be simpler to call napi_schedule() when you want to do tx descriptor
cleanup and have the function always be called in one place. If you do this
carefully, you can probably avoid most of the locking.

> >> +	priv->id = of_alias_get_id(node, "ethernet");
> >> +	if (priv->id < 0) {
> >> +		dev_warn(d, "no ethernet alias\n");
> >> +		ret = -EINVAL;
> >> +		goto init_fail;
> >> +	}
> >
> > Apparently you try to rely on the alias to refer to a specific piece
> > of hardware, which is not correct. The alias is meant to be selectable
> > to match e.g. the numbering written on the external connector, which
> > is totally independent of the internal hardware.
> 
> Thanks for clarifying alisa.
> The id will be used for start channel in ppe, RX_DESC_NUM * priv->id;
> Is it suitable directly use id in the dts, or other name such as start-chan?

Yes, I think it's best to just pass the id the same way as the channel
number.

	Arnd

^ permalink raw reply	[flat|nested] 148+ messages in thread

* [PATCH 3/3] net: hisilicon: new hip04 ethernet driver
@ 2014-03-22  8:08         ` Arnd Bergmann
  0 siblings, 0 replies; 148+ messages in thread
From: Arnd Bergmann @ 2014-03-22  8:08 UTC (permalink / raw)
  To: linux-arm-kernel

On Saturday 22 March 2014 09:18:35 zhangfei wrote:
> >> +static void hip04_tx_reclaim(struct net_device *ndev, bool force)
> >> +{
> >> +	struct hip04_priv *priv = netdev_priv(ndev);
> >> +	unsigned tx_head = priv->tx_head;
> >> +	unsigned tx_tail = priv->tx_tail;
> >> +	struct tx_desc *desc = &priv->tx_desc[priv->tx_tail];
> >> +
> >> +	while (tx_tail != tx_head) {
> >> +		if (desc->send_addr != 0) {
> >> +			if (force)
> >> +				desc->send_addr = 0;
> >> +			else
> >> +				break;
> >> +		}
> >> +		if (priv->tx_phys[tx_tail]) {
> >> +			dma_unmap_single(&ndev->dev, priv->tx_phys[tx_tail],
> >> +				priv->tx_skb[tx_tail]->len, DMA_TO_DEVICE);
> >> +			priv->tx_phys[tx_tail] = 0;
> >> +		}
> >> +		dev_kfree_skb_irq(priv->tx_skb[tx_tail]);
> >> +		priv->tx_skb[tx_tail] = NULL;
> >> +		tx_tail = TX_NEXT(tx_tail);
> >> +		priv->tx_count--;
> >> +	}
> >> +	priv->tx_tail = tx_tail;
> >> +}
> >
> > You call this function from start_xmit(), which may be too early, causing the
> > dma_unmap_single() and dev_kfree_skb_irq() functions to be called while the
> > device is still accessing the data. This is bad.
> 
> There is a protection.
> Only after xmit done, desc->send_addr = 0, which is cleared by hardware.

Ok, I see.

> > You have to ensure that you only ever clean up tx buffers that have been
> > successfully transmitted. Also, you should use an interrupt to notify you
> > of this in case there is no further xmit packet. Otherwise you may have
> > a user space program waiting indefinitely for a single packet to get sent
> > on a socket.
> >
> > It's ok to also call the cleanup from start_xmit, but calling it from the
> > poll() function or another appropriate place is required.
> 
> There is no transmit done interrupt, so relying on every xmit to reclaim 
> finished buffer.
> I thought it would be enough, since there are TX_DESC_NUM descs.
> It is a good idea also put reclaim in poll, then will add spin lock etc.

It may be simpler to call napi_schedule() when you want to do tx descriptor
cleanup and have the function always be called in one place. If you do this
carefully, you can probably avoid most of the locking.

> >> +	priv->id = of_alias_get_id(node, "ethernet");
> >> +	if (priv->id < 0) {
> >> +		dev_warn(d, "no ethernet alias\n");
> >> +		ret = -EINVAL;
> >> +		goto init_fail;
> >> +	}
> >
> > Apparently you try to rely on the alias to refer to a specific piece
> > of hardware, which is not correct. The alias is meant to be selectable
> > to match e.g. the numbering written on the external connector, which
> > is totally independent of the internal hardware.
> 
> Thanks for clarifying alisa.
> The id will be used for start channel in ppe, RX_DESC_NUM * priv->id;
> Is it suitable directly use id in the dts, or other name such as start-chan?

Yes, I think it's best to just pass the id the same way as the channel
number.

	Arnd

^ permalink raw reply	[flat|nested] 148+ messages in thread

* Re: [PATCH 3/3] net: hisilicon: new hip04 ethernet driver
  2014-03-21 15:27     ` Arnd Bergmann
@ 2014-03-22  1:18       ` zhangfei
  -1 siblings, 0 replies; 148+ messages in thread
From: zhangfei @ 2014-03-22  1:18 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: David S. Miller, linux, f.fainelli, mark.rutland,
	sergei.shtylyov, linux-arm-kernel, netdev, devicetree

Dear Arnd

On 03/21/2014 11:27 PM, Arnd Bergmann wrote:
> On Friday 21 March 2014 23:09:30 Zhangfei Gao wrote:
>
>> +
>> +static void __iomem *ppebase;
>
> Any reason why you still have this, rather than using a separate
> driver for it as we discussed? If you have comments that you still
> plan to address, please mention those in the introductory mail,
> so you don't get the same review comments multiple times.

Sorry for my bad understanding.
I thought you agreed to use this.
Will check your earlier comments more carefully.

>
>
>> +static void hip04_tx_reclaim(struct net_device *ndev, bool force)
>> +{
>> +	struct hip04_priv *priv = netdev_priv(ndev);
>> +	unsigned tx_head = priv->tx_head;
>> +	unsigned tx_tail = priv->tx_tail;
>> +	struct tx_desc *desc = &priv->tx_desc[priv->tx_tail];
>> +
>> +	while (tx_tail != tx_head) {
>> +		if (desc->send_addr != 0) {
>> +			if (force)
>> +				desc->send_addr = 0;
>> +			else
>> +				break;
>> +		}
>> +		if (priv->tx_phys[tx_tail]) {
>> +			dma_unmap_single(&ndev->dev, priv->tx_phys[tx_tail],
>> +				priv->tx_skb[tx_tail]->len, DMA_TO_DEVICE);
>> +			priv->tx_phys[tx_tail] = 0;
>> +		}
>> +		dev_kfree_skb_irq(priv->tx_skb[tx_tail]);
>> +		priv->tx_skb[tx_tail] = NULL;
>> +		tx_tail = TX_NEXT(tx_tail);
>> +		priv->tx_count--;
>> +	}
>> +	priv->tx_tail = tx_tail;
>> +}
>
> You call this function from start_xmit(), which may be too early, causing the
> dma_unmap_single() and dev_kfree_skb_irq() functions to be called while the
> device is still accessing the data. This is bad.

There is a protection.
Only after xmit done, desc->send_addr = 0, which is cleared by hardware.

>
> You have to ensure that you only ever clean up tx buffers that have been
> successfully transmitted. Also, you should use an interrupt to notify you
> of this in case there is no further xmit packet. Otherwise you may have
> a user space program waiting indefinitely for a single packet to get sent
> on a socket.
>
> It's ok to also call the cleanup from start_xmit, but calling it from the
> poll() function or another appropriate place is required.

There is no transmit done interrupt, so relying on every xmit to reclaim 
finished buffer.
I thought it would be enough, since there are TX_DESC_NUM descs.
It is a good idea also put reclaim in poll, then will add spin lock etc.

>
>> +	priv->id = of_alias_get_id(node, "ethernet");
>> +	if (priv->id < 0) {
>> +		dev_warn(d, "no ethernet alias\n");
>> +		ret = -EINVAL;
>> +		goto init_fail;
>> +	}
>
> Apparently you try to rely on the alias to refer to a specific piece
> of hardware, which is not correct. The alias is meant to be selectable
> to match e.g. the numbering written on the external connector, which
> is totally independent of the internal hardware.

Thanks for clarifying alisa.
The id will be used for start channel in ppe, RX_DESC_NUM * priv->id;
Is it suitable directly use id in the dts, or other name such as start-chan?

Thanks

^ permalink raw reply	[flat|nested] 148+ messages in thread

* [PATCH 3/3] net: hisilicon: new hip04 ethernet driver
@ 2014-03-22  1:18       ` zhangfei
  0 siblings, 0 replies; 148+ messages in thread
From: zhangfei @ 2014-03-22  1:18 UTC (permalink / raw)
  To: linux-arm-kernel

Dear Arnd

On 03/21/2014 11:27 PM, Arnd Bergmann wrote:
> On Friday 21 March 2014 23:09:30 Zhangfei Gao wrote:
>
>> +
>> +static void __iomem *ppebase;
>
> Any reason why you still have this, rather than using a separate
> driver for it as we discussed? If you have comments that you still
> plan to address, please mention those in the introductory mail,
> so you don't get the same review comments multiple times.

Sorry for my bad understanding.
I thought you agreed to use this.
Will check your earlier comments more carefully.

>
>
>> +static void hip04_tx_reclaim(struct net_device *ndev, bool force)
>> +{
>> +	struct hip04_priv *priv = netdev_priv(ndev);
>> +	unsigned tx_head = priv->tx_head;
>> +	unsigned tx_tail = priv->tx_tail;
>> +	struct tx_desc *desc = &priv->tx_desc[priv->tx_tail];
>> +
>> +	while (tx_tail != tx_head) {
>> +		if (desc->send_addr != 0) {
>> +			if (force)
>> +				desc->send_addr = 0;
>> +			else
>> +				break;
>> +		}
>> +		if (priv->tx_phys[tx_tail]) {
>> +			dma_unmap_single(&ndev->dev, priv->tx_phys[tx_tail],
>> +				priv->tx_skb[tx_tail]->len, DMA_TO_DEVICE);
>> +			priv->tx_phys[tx_tail] = 0;
>> +		}
>> +		dev_kfree_skb_irq(priv->tx_skb[tx_tail]);
>> +		priv->tx_skb[tx_tail] = NULL;
>> +		tx_tail = TX_NEXT(tx_tail);
>> +		priv->tx_count--;
>> +	}
>> +	priv->tx_tail = tx_tail;
>> +}
>
> You call this function from start_xmit(), which may be too early, causing the
> dma_unmap_single() and dev_kfree_skb_irq() functions to be called while the
> device is still accessing the data. This is bad.

There is a protection.
Only after xmit done, desc->send_addr = 0, which is cleared by hardware.

>
> You have to ensure that you only ever clean up tx buffers that have been
> successfully transmitted. Also, you should use an interrupt to notify you
> of this in case there is no further xmit packet. Otherwise you may have
> a user space program waiting indefinitely for a single packet to get sent
> on a socket.
>
> It's ok to also call the cleanup from start_xmit, but calling it from the
> poll() function or another appropriate place is required.

There is no transmit done interrupt, so relying on every xmit to reclaim 
finished buffer.
I thought it would be enough, since there are TX_DESC_NUM descs.
It is a good idea also put reclaim in poll, then will add spin lock etc.

>
>> +	priv->id = of_alias_get_id(node, "ethernet");
>> +	if (priv->id < 0) {
>> +		dev_warn(d, "no ethernet alias\n");
>> +		ret = -EINVAL;
>> +		goto init_fail;
>> +	}
>
> Apparently you try to rely on the alias to refer to a specific piece
> of hardware, which is not correct. The alias is meant to be selectable
> to match e.g. the numbering written on the external connector, which
> is totally independent of the internal hardware.

Thanks for clarifying alisa.
The id will be used for start channel in ppe, RX_DESC_NUM * priv->id;
Is it suitable directly use id in the dts, or other name such as start-chan?

Thanks

^ permalink raw reply	[flat|nested] 148+ messages in thread

* Re: [PATCH 3/3] net: hisilicon: new hip04 ethernet driver
  2014-03-21 15:09   ` Zhangfei Gao
@ 2014-03-21 15:27     ` Arnd Bergmann
  -1 siblings, 0 replies; 148+ messages in thread
From: Arnd Bergmann @ 2014-03-21 15:27 UTC (permalink / raw)
  To: Zhangfei Gao
  Cc: mark.rutland, devicetree, f.fainelli, linux, sergei.shtylyov,
	netdev, David S. Miller, linux-arm-kernel

On Friday 21 March 2014 23:09:30 Zhangfei Gao wrote:

> +
> +static void __iomem *ppebase;

Any reason why you still have this, rather than using a separate
driver for it as we discussed? If you have comments that you still
plan to address, please mention those in the introductory mail,
so you don't get the same review comments multiple times.


> +static void hip04_tx_reclaim(struct net_device *ndev, bool force)
> +{
> +	struct hip04_priv *priv = netdev_priv(ndev);
> +	unsigned tx_head = priv->tx_head;
> +	unsigned tx_tail = priv->tx_tail;
> +	struct tx_desc *desc = &priv->tx_desc[priv->tx_tail];
> +
> +	while (tx_tail != tx_head) {
> +		if (desc->send_addr != 0) {
> +			if (force)
> +				desc->send_addr = 0;
> +			else
> +				break;
> +		}
> +		if (priv->tx_phys[tx_tail]) {
> +			dma_unmap_single(&ndev->dev, priv->tx_phys[tx_tail],
> +				priv->tx_skb[tx_tail]->len, DMA_TO_DEVICE);
> +			priv->tx_phys[tx_tail] = 0;
> +		}
> +		dev_kfree_skb_irq(priv->tx_skb[tx_tail]);
> +		priv->tx_skb[tx_tail] = NULL;
> +		tx_tail = TX_NEXT(tx_tail);
> +		priv->tx_count--;
> +	}
> +	priv->tx_tail = tx_tail;
> +}

You call this function from start_xmit(), which may be too early, causing the
dma_unmap_single() and dev_kfree_skb_irq() functions to be called while the
device is still accessing the data. This is bad.

You have to ensure that you only ever clean up tx buffers that have been
successfully transmitted. Also, you should use an interrupt to notify you
of this in case there is no further xmit packet. Otherwise you may have
a user space program waiting indefinitely for a single packet to get sent
on a socket.

It's ok to also call the cleanup from start_xmit, but calling it from the
poll() function or another appropriate place is required.

> +	priv->id = of_alias_get_id(node, "ethernet");
> +	if (priv->id < 0) {
> +		dev_warn(d, "no ethernet alias\n");
> +		ret = -EINVAL;
> +		goto init_fail;
> +	}

Apparently you try to rely on the alias to refer to a specific piece
of hardware, which is not correct. The alias is meant to be selectable
to match e.g. the numbering written on the external connector, which
is totally independent of the internal hardware.

	Arnd

^ permalink raw reply	[flat|nested] 148+ messages in thread

* [PATCH 3/3] net: hisilicon: new hip04 ethernet driver
@ 2014-03-21 15:27     ` Arnd Bergmann
  0 siblings, 0 replies; 148+ messages in thread
From: Arnd Bergmann @ 2014-03-21 15:27 UTC (permalink / raw)
  To: linux-arm-kernel

On Friday 21 March 2014 23:09:30 Zhangfei Gao wrote:

> +
> +static void __iomem *ppebase;

Any reason why you still have this, rather than using a separate
driver for it as we discussed? If you have comments that you still
plan to address, please mention those in the introductory mail,
so you don't get the same review comments multiple times.


> +static void hip04_tx_reclaim(struct net_device *ndev, bool force)
> +{
> +	struct hip04_priv *priv = netdev_priv(ndev);
> +	unsigned tx_head = priv->tx_head;
> +	unsigned tx_tail = priv->tx_tail;
> +	struct tx_desc *desc = &priv->tx_desc[priv->tx_tail];
> +
> +	while (tx_tail != tx_head) {
> +		if (desc->send_addr != 0) {
> +			if (force)
> +				desc->send_addr = 0;
> +			else
> +				break;
> +		}
> +		if (priv->tx_phys[tx_tail]) {
> +			dma_unmap_single(&ndev->dev, priv->tx_phys[tx_tail],
> +				priv->tx_skb[tx_tail]->len, DMA_TO_DEVICE);
> +			priv->tx_phys[tx_tail] = 0;
> +		}
> +		dev_kfree_skb_irq(priv->tx_skb[tx_tail]);
> +		priv->tx_skb[tx_tail] = NULL;
> +		tx_tail = TX_NEXT(tx_tail);
> +		priv->tx_count--;
> +	}
> +	priv->tx_tail = tx_tail;
> +}

You call this function from start_xmit(), which may be too early, causing the
dma_unmap_single() and dev_kfree_skb_irq() functions to be called while the
device is still accessing the data. This is bad.

You have to ensure that you only ever clean up tx buffers that have been
successfully transmitted. Also, you should use an interrupt to notify you
of this in case there is no further xmit packet. Otherwise you may have
a user space program waiting indefinitely for a single packet to get sent
on a socket.

It's ok to also call the cleanup from start_xmit, but calling it from the
poll() function or another appropriate place is required.

> +	priv->id = of_alias_get_id(node, "ethernet");
> +	if (priv->id < 0) {
> +		dev_warn(d, "no ethernet alias\n");
> +		ret = -EINVAL;
> +		goto init_fail;
> +	}

Apparently you try to rely on the alias to refer to a specific piece
of hardware, which is not correct. The alias is meant to be selectable
to match e.g. the numbering written on the external connector, which
is totally independent of the internal hardware.

	Arnd

^ permalink raw reply	[flat|nested] 148+ messages in thread

* [PATCH 3/3] net: hisilicon: new hip04 ethernet driver
  2014-03-21 15:09 [PATCH v2 " Zhangfei Gao
@ 2014-03-21 15:09   ` Zhangfei Gao
  0 siblings, 0 replies; 148+ messages in thread
From: Zhangfei Gao @ 2014-03-21 15:09 UTC (permalink / raw)
  To: David S. Miller, linux, arnd, f.fainelli, mark.rutland, sergei.shtylyov
  Cc: linux-arm-kernel, netdev, devicetree, Zhangfei Gao

Support Hisilicon hip04 ethernet driver, including 100M / 1000M controller

Signed-off-by: Zhangfei Gao <zhangfei.gao@linaro.org>
---
 drivers/net/ethernet/hisilicon/Makefile    |    2 +-
 drivers/net/ethernet/hisilicon/hip04_eth.c |  730 ++++++++++++++++++++++++++++
 2 files changed, 731 insertions(+), 1 deletion(-)
 create mode 100644 drivers/net/ethernet/hisilicon/hip04_eth.c

diff --git a/drivers/net/ethernet/hisilicon/Makefile b/drivers/net/ethernet/hisilicon/Makefile
index 1d6eb6e..e6fe7af 100644
--- a/drivers/net/ethernet/hisilicon/Makefile
+++ b/drivers/net/ethernet/hisilicon/Makefile
@@ -2,4 +2,4 @@
 # Makefile for the HISILICON network device drivers.
 #
 
-obj-$(CONFIG_HIP04_ETH) += hip04_mdio.o
+obj-$(CONFIG_HIP04_ETH) += hip04_eth.o hip04_mdio.o
diff --git a/drivers/net/ethernet/hisilicon/hip04_eth.c b/drivers/net/ethernet/hisilicon/hip04_eth.c
new file mode 100644
index 0000000..1211fb4
--- /dev/null
+++ b/drivers/net/ethernet/hisilicon/hip04_eth.c
@@ -0,0 +1,730 @@
+
+/* Copyright (c) 2014 Linaro Ltd.
+ * Copyright (c) 2014 Hisilicon Limited.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ */
+
+#include <linux/module.h>
+#include <linux/etherdevice.h>
+#include <linux/platform_device.h>
+#include <linux/interrupt.h>
+#include <linux/of_address.h>
+#include <linux/phy.h>
+#include <linux/of_mdio.h>
+#include <linux/of_net.h>
+
+#define PPE_CFG_RX_CFF_ADDR		0x100
+#define PPE_CFG_POOL_GRP		0x300
+#define PPE_CFG_RX_BUF_SIZE		0x400
+#define PPE_CFG_RX_FIFO_SIZE		0x500
+#define PPE_CURR_BUF_CNT_REG		0xa200
+
+#define GE_DUPLEX_TYPE			0x8
+#define GE_MAX_FRM_SIZE_REG		0x3c
+#define GE_PORT_MODE			0x40
+#define GE_PORT_EN			0x44
+#define GE_SHORT_RUNTS_THR_REG		0x50
+#define GE_TX_LOCAL_PAGE_REG		0x5c
+#define GE_TRANSMIT_CONTROL_REG		0x60
+#define GE_CF_CRC_STRIP_REG		0x1b0
+#define GE_MODE_CHANGE_EN		0x1b4
+#define GE_RECV_CONTROL_REG		0x1e0
+#define GE_STATION_MAC_ADDRESS		0x210
+#define PPE_CFG_TX_PKT_BD_ADDR		0x420
+#define PPE_CFG_MAX_FRAME_LEN_REG	0x408
+#define PPE_CFG_BUS_CTRL_REG		0x424
+#define PPE_CFG_RX_CTRL_REG		0x428
+#define PPE_CFG_RX_PKT_MODE_REG		0x438
+#define PPE_CFG_QOS_VMID_GEN		0x500
+#define PPE_CFG_RX_PKT_INT		0x538
+#define PPE_INTEN			0x600
+#define PPE_INTSTS			0x608
+#define PPE_RINT			0x604
+#define PPE_CFG_STS_MODE		0x700
+#define PPE_HIS_RX_PKT_CNT		0x804
+
+/* REG_INTERRUPT */
+#define RCV_INT				BIT(10)
+#define RCV_NOBUF			BIT(8)
+#define DEF_INT_MASK			0x41fdf
+
+#define RX_DESC_NUM			64
+#define TX_DESC_NUM			64
+#define TX_NEXT(N)			(((N) + 1) & (TX_DESC_NUM-1))
+#define RX_NEXT(N)			(((N) + 1) & (RX_DESC_NUM-1))
+
+#define GMAC_PPE_RX_PKT_MAX_LEN		379
+#define GMAC_MAX_PKT_LEN		1516
+#define DESC_DEF_CFG			0x14
+#define RX_BUF_SIZE			1600
+#define RX_PKT_ERR			0x3
+#define TX_TIMEOUT			(6 * HZ)
+
+#define DRV_NAME			"hip04-ether"
+
+struct tx_desc {
+	u32 send_addr;
+	u16 reserved_16;
+	u16 send_size;
+	u32 reserved_32;
+	u32 cfg;
+	u32 wb_addr;
+} ____cacheline_aligned;
+
+struct rx_desc {
+	u16 reserved_16;
+	u16 pkt_len;
+	u32 reserve1[3];
+	u32 pkt_err;
+	u32 reserve2[4];
+};
+
+struct hip04_priv {
+	void __iomem *base;
+	int phy_mode;
+	int id;
+	unsigned int port;
+	unsigned int speed;
+	unsigned int duplex;
+	unsigned int reg_inten;
+
+	struct napi_struct napi;
+	struct net_device *ndev;
+
+	struct tx_desc *tx_desc;
+	dma_addr_t tx_desc_dma;
+	struct sk_buff *tx_skb[TX_DESC_NUM];
+	dma_addr_t tx_phys[TX_DESC_NUM];
+	unsigned int tx_head;
+	unsigned int tx_tail;
+	unsigned int tx_count;
+
+	unsigned char *rx_buf[RX_DESC_NUM];
+	dma_addr_t rx_phys[RX_DESC_NUM];
+	unsigned int rx_head;
+	unsigned int rx_buf_size;
+
+	struct device_node *phy_node;
+	struct phy_device *phy;
+};
+
+static void __iomem *ppebase;
+
+static void hip04_config_port(struct hip04_priv *priv, u32 speed, u32 duplex)
+{
+	u32 val;
+
+	priv->speed = speed;
+	priv->duplex = duplex;
+
+	switch (priv->phy_mode) {
+	case PHY_INTERFACE_MODE_SGMII:
+		if (speed == SPEED_1000)
+			val = 8;
+		else
+			val = 7;
+		break;
+	case PHY_INTERFACE_MODE_MII:
+		val = 1;	/* SPEED_100 */
+		break;
+	default:
+		val = 0;
+		break;
+	}
+	writel_relaxed(val, priv->base + GE_PORT_MODE);
+
+	val = (duplex) ? BIT(0) : 0;
+	writel_relaxed(val, priv->base + GE_DUPLEX_TYPE);
+
+	val = BIT(0);
+	writel_relaxed(val, priv->base + GE_MODE_CHANGE_EN);
+}
+
+static void hip04_reset_ppe(struct hip04_priv *priv)
+{
+	u32 val;
+
+	do {
+		val =
+		readl_relaxed(ppebase + priv->port * 4 + PPE_CURR_BUF_CNT_REG);
+		readl_relaxed(ppebase + priv->port * 4 + PPE_CFG_RX_CFF_ADDR);
+	} while (val & 0xfff);
+}
+
+static void hip04_config_fifo(struct hip04_priv *priv)
+{
+	u32 val;
+
+	val = readl_relaxed(priv->base + PPE_CFG_STS_MODE);
+	val |= BIT(12);			/* PPE_HIS_RX_PKT_CNT read clear */
+	writel_relaxed(val, priv->base + PPE_CFG_STS_MODE);
+
+	val = BIT(priv->port);
+	writel_relaxed(val, ppebase + priv->port * 4 + PPE_CFG_POOL_GRP);
+
+	val = priv->port << 8;
+	val |= BIT(14);
+	writel_relaxed(val, priv->base + PPE_CFG_QOS_VMID_GEN);
+
+	val = RX_BUF_SIZE;
+	writel_relaxed(val, ppebase + priv->port * 4 + PPE_CFG_RX_BUF_SIZE);
+
+	val = RX_DESC_NUM << 16;	/* depth */
+	val |= BIT(11);			/* seq: first set first ues */
+	val |= RX_DESC_NUM * priv->id;	/* start_addr */
+	writel_relaxed(val, ppebase + priv->port * 4 + PPE_CFG_RX_FIFO_SIZE);
+
+	/* pkt store format */
+	val = NET_IP_ALIGN << 11;	/* align */
+	writel_relaxed(val, priv->base + PPE_CFG_RX_CTRL_REG);
+
+	/* following cfg required for 1000M */
+	/* pkt mode */
+	val = BIT(18);			/* align */
+	writel_relaxed(val, priv->base + PPE_CFG_RX_PKT_MODE_REG);
+
+	/* set bus ctrl */
+	val = BIT(14);			/* buffer locally release */
+	val |= BIT(0);			/* big endian */
+	writel_relaxed(val, priv->base + PPE_CFG_BUS_CTRL_REG);
+
+	/* set max pkt len, curtail if exceed */
+	val = GMAC_PPE_RX_PKT_MAX_LEN;	/* max buffer len */
+	writel_relaxed(val, priv->base + PPE_CFG_MAX_FRAME_LEN_REG);
+
+	/* set max len of each pkt */
+	val = GMAC_MAX_PKT_LEN;		/* max buffer len */
+	writel_relaxed(val, priv->base + GE_MAX_FRM_SIZE_REG);
+
+	/* set min len of each pkt */
+	val = 31;			/* min buffer len */
+	writel_relaxed(val, priv->base + GE_SHORT_RUNTS_THR_REG);
+
+	/* tx */
+	val = readl_relaxed(priv->base + GE_TRANSMIT_CONTROL_REG);
+	val |= BIT(5);			/* tx auto neg */
+	val |= BIT(6);			/* tx add crc */
+	val |= BIT(7);			/* tx short pad through */
+	writel_relaxed(val, priv->base + GE_TRANSMIT_CONTROL_REG);
+
+	/* rx crc */
+	val = BIT(0);			/* rx strip crc */
+	writel_relaxed(val, priv->base + GE_CF_CRC_STRIP_REG);
+
+	/* rx */
+	val = readl_relaxed(priv->base + GE_RECV_CONTROL_REG);
+	val |= BIT(3);			/* rx strip pad */
+	val |= BIT(4);			/* run pkt en */
+	writel_relaxed(val, priv->base + GE_RECV_CONTROL_REG);
+
+	/* auto neg control */
+	val = BIT(0);
+	writel_relaxed(val, priv->base + GE_TX_LOCAL_PAGE_REG);
+}
+
+static void hip04_mac_enable(struct net_device *ndev, bool enable)
+{
+	struct hip04_priv *priv = netdev_priv(ndev);
+	u32 val;
+
+	if (enable) {
+		/* enable tx & rx */
+		val = readl_relaxed(priv->base + GE_PORT_EN);
+		val |= BIT(1);		/* rx*/
+		val |= BIT(2);		/* tx*/
+		writel_relaxed(val, priv->base + GE_PORT_EN);
+
+		/* enable interrupt */
+		priv->reg_inten = DEF_INT_MASK;
+		writel_relaxed(priv->reg_inten, priv->base + PPE_INTEN);
+
+		/* clear rx int */
+		val = RCV_INT;
+		writel_relaxed(val, priv->base + PPE_RINT);
+
+		/* config recv int*/
+		val = BIT(6);		/* int threshold 1 package */
+		val |= 0x4;		/* recv timeout */
+		writel_relaxed(val, priv->base + PPE_CFG_RX_PKT_INT);
+	} else {
+		/* disable int */
+		priv->reg_inten &= ~(RCV_INT | RCV_NOBUF);
+		writel_relaxed(priv->reg_inten, priv->base + PPE_INTEN);
+
+		/* disable tx & rx */
+		val = readl_relaxed(priv->base + GE_PORT_EN);
+		val &= ~(BIT(1));	/* rx*/
+		val &= ~(BIT(2));	/* tx*/
+		writel_relaxed(val, priv->base + GE_PORT_EN);
+	}
+}
+
+static void hip04_set_xmit_desc(struct hip04_priv *priv, dma_addr_t phys)
+{
+	writel(phys, priv->base + PPE_CFG_TX_PKT_BD_ADDR);
+}
+
+static void hip04_set_recv_desc(struct hip04_priv *priv, dma_addr_t phys)
+{
+	writel(phys, ppebase + priv->port * 4 + PPE_CFG_RX_CFF_ADDR);
+}
+
+static u32 hip04_recv_cnt(struct hip04_priv *priv)
+{
+	return readl(priv->base + PPE_HIS_RX_PKT_CNT);
+}
+
+static void hip04_update_mac_address(struct net_device *ndev)
+{
+	struct hip04_priv *priv = netdev_priv(ndev);
+
+	writel_relaxed(((ndev->dev_addr[0] << 8) | (ndev->dev_addr[1])),
+			priv->base + GE_STATION_MAC_ADDRESS);
+	writel_relaxed(((ndev->dev_addr[2] << 24) | (ndev->dev_addr[3] << 16) |
+			(ndev->dev_addr[4] << 8) | (ndev->dev_addr[5])),
+			priv->base + GE_STATION_MAC_ADDRESS + 4);
+}
+
+static int hip04_set_mac_address(struct net_device *ndev, void *addr)
+{
+	eth_mac_addr(ndev, addr);
+	hip04_update_mac_address(ndev);
+	return 0;
+}
+
+static int hip04_rx_poll(struct napi_struct *napi, int budget)
+{
+	struct hip04_priv *priv = container_of(napi, struct hip04_priv, napi);
+	struct net_device *ndev = priv->ndev;
+	struct net_device_stats *stats = &ndev->stats;
+	unsigned int cnt = hip04_recv_cnt(priv);
+	struct sk_buff *skb;
+	struct rx_desc *desc;
+	unsigned char *buf;
+	dma_addr_t phys;
+	int rx = 0;
+	u16 len;
+	u32 err;
+
+	while (cnt) {
+		buf = priv->rx_buf[priv->rx_head];
+		skb = build_skb(buf, priv->rx_buf_size);
+		if (unlikely(!skb))
+			net_dbg_ratelimited("build_skb failed\n");
+
+		dma_unmap_single(&ndev->dev, priv->rx_phys[priv->rx_head],
+				RX_BUF_SIZE, DMA_FROM_DEVICE);
+		priv->rx_phys[priv->rx_head] = 0;
+
+		desc = (struct rx_desc *)skb->data;
+		len = be16_to_cpu(desc->pkt_len);
+		err = be32_to_cpu(desc->pkt_err);
+
+		if (len > RX_BUF_SIZE)
+			len = RX_BUF_SIZE;
+		if (0 == len)
+			break;
+
+		if (err & RX_PKT_ERR) {
+			dev_kfree_skb_any(skb);
+			stats->rx_dropped++;
+			stats->rx_errors++;
+			continue;
+		}
+
+		stats->rx_packets++;
+		stats->rx_bytes += len;
+
+		skb_reserve(skb, NET_SKB_PAD + NET_IP_ALIGN);
+		skb_put(skb, len);
+		skb->protocol = eth_type_trans(skb, ndev);
+		napi_gro_receive(&priv->napi, skb);
+
+		buf = netdev_alloc_frag(priv->rx_buf_size);
+		if (!buf)
+			return -ENOMEM;
+		phys = dma_map_single(&ndev->dev, buf,
+				RX_BUF_SIZE, DMA_FROM_DEVICE);
+		priv->rx_buf[priv->rx_head] = buf;
+		priv->rx_phys[priv->rx_head] = phys;
+		hip04_set_recv_desc(priv, phys);
+
+		priv->rx_head = RX_NEXT(priv->rx_head);
+		if (rx++ >= budget)
+			break;
+
+		if (--cnt == 0)
+			cnt = hip04_recv_cnt(priv);
+	}
+
+	if (rx < budget) {
+		napi_complete(napi);
+
+		/* enable rx interrupt */
+		priv->reg_inten |= RCV_INT | RCV_NOBUF;
+		writel(priv->reg_inten, priv->base + PPE_INTEN);
+	}
+
+	return rx;
+}
+
+static irqreturn_t hip04_mac_interrupt(int irq, void *dev_id)
+{
+	struct net_device *ndev = (struct net_device *) dev_id;
+	struct hip04_priv *priv = netdev_priv(ndev);
+	u32 ists = readl_relaxed(priv->base + PPE_INTSTS);
+	u32 val = DEF_INT_MASK;
+
+	writel_relaxed(val, priv->base + PPE_RINT);
+
+	if (ists & (RCV_INT | RCV_NOBUF)) {
+		if (napi_schedule_prep(&priv->napi)) {
+			/* disable rx interrupt */
+			priv->reg_inten &= ~(RCV_INT | RCV_NOBUF);
+			writel_relaxed(priv->reg_inten, priv->base + PPE_INTEN);
+			__napi_schedule(&priv->napi);
+		}
+	}
+
+	return IRQ_HANDLED;
+}
+
+static void hip04_tx_reclaim(struct net_device *ndev, bool force)
+{
+	struct hip04_priv *priv = netdev_priv(ndev);
+	unsigned tx_head = priv->tx_head;
+	unsigned tx_tail = priv->tx_tail;
+	struct tx_desc *desc = &priv->tx_desc[priv->tx_tail];
+
+	while (tx_tail != tx_head) {
+		if (desc->send_addr != 0) {
+			if (force)
+				desc->send_addr = 0;
+			else
+				break;
+		}
+		if (priv->tx_phys[tx_tail]) {
+			dma_unmap_single(&ndev->dev, priv->tx_phys[tx_tail],
+				priv->tx_skb[tx_tail]->len, DMA_TO_DEVICE);
+			priv->tx_phys[tx_tail] = 0;
+		}
+		dev_kfree_skb_irq(priv->tx_skb[tx_tail]);
+		priv->tx_skb[tx_tail] = NULL;
+		tx_tail = TX_NEXT(tx_tail);
+		priv->tx_count--;
+	}
+	priv->tx_tail = tx_tail;
+}
+
+static int hip04_mac_start_xmit(struct sk_buff *skb, struct net_device *ndev)
+{
+	struct hip04_priv *priv = netdev_priv(ndev);
+	struct net_device_stats *stats = &ndev->stats;
+	unsigned int tx_head = priv->tx_head;
+	struct tx_desc *desc = &priv->tx_desc[tx_head];
+	dma_addr_t phys;
+
+	hip04_tx_reclaim(ndev, false);
+
+	if (priv->tx_count++ >= TX_DESC_NUM) {
+		net_dbg_ratelimited("no TX space for packet\n");
+		netif_stop_queue(ndev);
+		return NETDEV_TX_BUSY;
+	}
+
+	phys = dma_map_single(&ndev->dev, skb->data, skb->len, DMA_TO_DEVICE);
+	priv->tx_skb[tx_head] = skb;
+	priv->tx_phys[tx_head] = phys;
+	desc->send_addr = cpu_to_be32(phys);
+	desc->send_size = cpu_to_be16(skb->len);
+	desc->cfg = cpu_to_be32(DESC_DEF_CFG);
+	phys = priv->tx_desc_dma + tx_head * sizeof(struct tx_desc);
+	desc->wb_addr = cpu_to_be32(phys);
+	skb_tx_timestamp(skb);
+	hip04_set_xmit_desc(priv, phys);
+	priv->tx_head = TX_NEXT(tx_head);
+
+	stats->tx_bytes += skb->len;
+	stats->tx_packets++;
+
+	return NETDEV_TX_OK;
+}
+
+static void hip04_adjust_link(struct net_device *ndev)
+{
+	struct hip04_priv *priv = netdev_priv(ndev);
+	struct phy_device *phy = priv->phy;
+
+	if ((priv->speed != phy->speed) || (priv->duplex != phy->duplex)) {
+		hip04_config_port(priv, phy->speed, phy->duplex);
+		phy_print_status(phy);
+	}
+}
+
+static int hip04_mac_open(struct net_device *ndev)
+{
+	struct hip04_priv *priv = netdev_priv(ndev);
+	int i;
+
+	priv->rx_head = 0;
+	priv->tx_head = 0;
+	priv->tx_tail = 0;
+	priv->tx_count = 0;
+
+	hip04_reset_ppe(priv);
+	for (i = 0; i < RX_DESC_NUM; i++) {
+		dma_addr_t phys;
+
+		phys = dma_map_single(&ndev->dev, priv->rx_buf[i],
+				RX_BUF_SIZE, DMA_FROM_DEVICE);
+		priv->rx_phys[i] = phys;
+		hip04_set_recv_desc(priv, phys);
+	}
+
+	if (priv->phy_node) {
+		priv->phy = of_phy_connect(ndev, priv->phy_node,
+			&hip04_adjust_link, 0, priv->phy_mode);
+		if (!priv->phy)
+			return -ENODEV;
+		phy_start(priv->phy);
+	}
+
+	netif_start_queue(ndev);
+	hip04_mac_enable(ndev, true);
+	napi_enable(&priv->napi);
+	return 0;
+}
+
+static int hip04_mac_stop(struct net_device *ndev)
+{
+	struct hip04_priv *priv = netdev_priv(ndev);
+	int i;
+
+	if (priv->phy)
+		phy_disconnect(priv->phy);
+
+	priv->phy = NULL;
+	napi_disable(&priv->napi);
+	netif_stop_queue(ndev);
+	hip04_mac_enable(ndev, false);
+	hip04_tx_reclaim(ndev, true);
+	hip04_reset_ppe(priv);
+
+	for (i = 0; i < RX_DESC_NUM; i++) {
+		if (priv->rx_phys[i]) {
+			dma_unmap_single(&ndev->dev, priv->rx_phys[i],
+					RX_BUF_SIZE, DMA_FROM_DEVICE);
+			priv->rx_phys[i] = 0;
+		}
+	}
+
+	return 0;
+}
+
+static void hip04_timeout(struct net_device *ndev)
+{
+	netif_wake_queue(ndev);
+	return;
+}
+
+static struct net_device_ops hip04_netdev_ops = {
+	.ndo_open		= hip04_mac_open,
+	.ndo_stop		= hip04_mac_stop,
+	.ndo_start_xmit		= hip04_mac_start_xmit,
+	.ndo_set_mac_address	= hip04_set_mac_address,
+	.ndo_tx_timeout         = hip04_timeout,
+	.ndo_validate_addr	= eth_validate_addr,
+	.ndo_change_mtu		= eth_change_mtu,
+};
+
+static int hip04_alloc_ring(struct net_device *ndev, struct device *d)
+{
+	struct hip04_priv *priv = netdev_priv(ndev);
+	int i;
+
+	priv->tx_desc = dma_alloc_coherent(d,
+			TX_DESC_NUM * sizeof(struct tx_desc),
+			&priv->tx_desc_dma, GFP_KERNEL);
+	if (!priv->tx_desc)
+		return -ENOMEM;
+
+	priv->rx_buf_size = RX_BUF_SIZE +
+			    SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
+	for (i = 0; i < RX_DESC_NUM; i++) {
+		priv->rx_buf[i] = netdev_alloc_frag(priv->rx_buf_size);
+		if (!priv->rx_buf[i])
+			return -ENOMEM;
+	}
+
+	return 0;
+}
+
+static void hip04_free_ring(struct net_device *ndev, struct device *d)
+{
+	struct hip04_priv *priv = netdev_priv(ndev);
+	int i;
+
+	for (i = 0; i < RX_DESC_NUM; i++)
+		if (priv->rx_buf[i])
+			put_page(virt_to_head_page(priv->rx_buf[i]));
+
+	for (i = 0; i < TX_DESC_NUM; i++)
+		if (priv->tx_skb[i])
+			dev_kfree_skb_any(priv->tx_skb[i]);
+
+	dma_free_coherent(d, TX_DESC_NUM * sizeof(struct tx_desc),
+			priv->tx_desc, priv->tx_desc_dma);
+}
+
+static int hip04_mac_probe(struct platform_device *pdev)
+{
+	struct device *d = &pdev->dev;
+	struct device_node *n, *node = d->of_node;
+	struct net_device *ndev;
+	struct hip04_priv *priv;
+	struct resource *res;
+	unsigned int irq;
+	int ret;
+
+	ndev = alloc_etherdev(sizeof(struct hip04_priv));
+	if (!ndev)
+		return -ENOMEM;
+
+	priv = netdev_priv(ndev);
+	priv->ndev = ndev;
+	platform_set_drvdata(pdev, ndev);
+
+	res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
+	priv->base = devm_ioremap_resource(d, res);
+	if (IS_ERR(priv->base)) {
+		ret = PTR_ERR(priv->base);
+		goto init_fail;
+	}
+
+	if (!ppebase) {
+		n = of_find_compatible_node(NULL, NULL, "hisilicon,hip04-ppe");
+		if (!n) {
+			ret = -EINVAL;
+			netdev_err(ndev, "not find hisilicon,ppe\n");
+			goto init_fail;
+		}
+		ppebase = of_iomap(n, 0);
+	}
+
+	n = of_parse_phandle(node, "port-handle", 0);
+	if (n) {
+		ret = of_property_read_u32(n, "reg", &priv->port);
+		if (ret) {
+			dev_warn(d, "no reg info\n");
+			goto init_fail;
+		}
+	} else {
+		dev_warn(d, "no port-handle\n");
+		ret = -EINVAL;
+		goto init_fail;
+	}
+
+	priv->id = of_alias_get_id(node, "ethernet");
+	if (priv->id < 0) {
+		dev_warn(d, "no ethernet alias\n");
+		ret = -EINVAL;
+		goto init_fail;
+	}
+
+	priv->phy_mode = of_get_phy_mode(node);
+	if (priv->phy_mode < 0) {
+		dev_warn(d, "not find phy-mode\n");
+		ret = -EINVAL;
+		goto init_fail;
+	}
+
+	irq = platform_get_irq(pdev, 0);
+	if (irq <= 0) {
+		ret = -EINVAL;
+		goto init_fail;
+	}
+
+	ether_setup(ndev);
+	ndev->netdev_ops = &hip04_netdev_ops;
+	ndev->watchdog_timeo = TX_TIMEOUT;
+	ndev->priv_flags |= IFF_UNICAST_FLT;
+	ndev->irq = irq;
+	netif_napi_add(ndev, &priv->napi, hip04_rx_poll, RX_DESC_NUM);
+	SET_NETDEV_DEV(ndev, &pdev->dev);
+
+	hip04_reset_ppe(priv);
+	if (priv->phy_mode == PHY_INTERFACE_MODE_MII)
+		hip04_config_port(priv, SPEED_100, DUPLEX_FULL);
+
+	hip04_config_fifo(priv);
+	random_ether_addr(ndev->dev_addr);
+	hip04_update_mac_address(ndev);
+
+	ret = hip04_alloc_ring(ndev, d);
+	if (ret) {
+		netdev_err(ndev, "alloc ring fail\n");
+		goto alloc_fail;
+	}
+
+	ret = devm_request_irq(d, irq, hip04_mac_interrupt,
+				0, pdev->name, ndev);
+	if (ret) {
+		netdev_err(ndev, "devm_request_irq failed\n");
+		goto alloc_fail;
+	}
+
+	priv->phy_node = of_parse_phandle(node, "phy-handle", 0);
+
+	ret = register_netdev(ndev);
+	if (ret) {
+		free_netdev(ndev);
+		goto alloc_fail;
+	}
+
+	return 0;
+alloc_fail:
+	hip04_free_ring(ndev, d);
+init_fail:
+	of_node_put(priv->phy_node);
+	free_netdev(ndev);
+	return ret;
+}
+
+static int hip04_remove(struct platform_device *pdev)
+{
+	struct net_device *ndev = platform_get_drvdata(pdev);
+	struct hip04_priv *priv = netdev_priv(ndev);
+	struct device *d = &pdev->dev;
+
+	hip04_free_ring(ndev, d);
+	unregister_netdev(ndev);
+	free_irq(ndev->irq, ndev);
+	of_node_put(priv->phy_node);
+	free_netdev(ndev);
+
+	return 0;
+}
+
+static const struct of_device_id hip04_mac_match[] = {
+	{ .compatible = "hisilicon,hip04-mac" },
+	{ }
+};
+
+static struct platform_driver hip04_mac_driver = {
+	.probe	= hip04_mac_probe,
+	.remove	= hip04_remove,
+	.driver	= {
+		.name		= DRV_NAME,
+		.owner		= THIS_MODULE,
+		.of_match_table	= hip04_mac_match,
+	},
+};
+module_platform_driver(hip04_mac_driver);
+
+MODULE_DESCRIPTION("HISILICON P04 Ethernet driver");
+MODULE_LICENSE("GPL v2");
+MODULE_ALIAS("platform:hip04-ether");
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 148+ messages in thread

* [PATCH 3/3] net: hisilicon: new hip04 ethernet driver
@ 2014-03-21 15:09   ` Zhangfei Gao
  0 siblings, 0 replies; 148+ messages in thread
From: Zhangfei Gao @ 2014-03-21 15:09 UTC (permalink / raw)
  To: linux-arm-kernel

Support Hisilicon hip04 ethernet driver, including 100M / 1000M controller

Signed-off-by: Zhangfei Gao <zhangfei.gao@linaro.org>
---
 drivers/net/ethernet/hisilicon/Makefile    |    2 +-
 drivers/net/ethernet/hisilicon/hip04_eth.c |  730 ++++++++++++++++++++++++++++
 2 files changed, 731 insertions(+), 1 deletion(-)
 create mode 100644 drivers/net/ethernet/hisilicon/hip04_eth.c

diff --git a/drivers/net/ethernet/hisilicon/Makefile b/drivers/net/ethernet/hisilicon/Makefile
index 1d6eb6e..e6fe7af 100644
--- a/drivers/net/ethernet/hisilicon/Makefile
+++ b/drivers/net/ethernet/hisilicon/Makefile
@@ -2,4 +2,4 @@
 # Makefile for the HISILICON network device drivers.
 #
 
-obj-$(CONFIG_HIP04_ETH) += hip04_mdio.o
+obj-$(CONFIG_HIP04_ETH) += hip04_eth.o hip04_mdio.o
diff --git a/drivers/net/ethernet/hisilicon/hip04_eth.c b/drivers/net/ethernet/hisilicon/hip04_eth.c
new file mode 100644
index 0000000..1211fb4
--- /dev/null
+++ b/drivers/net/ethernet/hisilicon/hip04_eth.c
@@ -0,0 +1,730 @@
+
+/* Copyright (c) 2014 Linaro Ltd.
+ * Copyright (c) 2014 Hisilicon Limited.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ */
+
+#include <linux/module.h>
+#include <linux/etherdevice.h>
+#include <linux/platform_device.h>
+#include <linux/interrupt.h>
+#include <linux/of_address.h>
+#include <linux/phy.h>
+#include <linux/of_mdio.h>
+#include <linux/of_net.h>
+
+#define PPE_CFG_RX_CFF_ADDR		0x100
+#define PPE_CFG_POOL_GRP		0x300
+#define PPE_CFG_RX_BUF_SIZE		0x400
+#define PPE_CFG_RX_FIFO_SIZE		0x500
+#define PPE_CURR_BUF_CNT_REG		0xa200
+
+#define GE_DUPLEX_TYPE			0x8
+#define GE_MAX_FRM_SIZE_REG		0x3c
+#define GE_PORT_MODE			0x40
+#define GE_PORT_EN			0x44
+#define GE_SHORT_RUNTS_THR_REG		0x50
+#define GE_TX_LOCAL_PAGE_REG		0x5c
+#define GE_TRANSMIT_CONTROL_REG		0x60
+#define GE_CF_CRC_STRIP_REG		0x1b0
+#define GE_MODE_CHANGE_EN		0x1b4
+#define GE_RECV_CONTROL_REG		0x1e0
+#define GE_STATION_MAC_ADDRESS		0x210
+#define PPE_CFG_TX_PKT_BD_ADDR		0x420
+#define PPE_CFG_MAX_FRAME_LEN_REG	0x408
+#define PPE_CFG_BUS_CTRL_REG		0x424
+#define PPE_CFG_RX_CTRL_REG		0x428
+#define PPE_CFG_RX_PKT_MODE_REG		0x438
+#define PPE_CFG_QOS_VMID_GEN		0x500
+#define PPE_CFG_RX_PKT_INT		0x538
+#define PPE_INTEN			0x600
+#define PPE_INTSTS			0x608
+#define PPE_RINT			0x604
+#define PPE_CFG_STS_MODE		0x700
+#define PPE_HIS_RX_PKT_CNT		0x804
+
+/* REG_INTERRUPT */
+#define RCV_INT				BIT(10)
+#define RCV_NOBUF			BIT(8)
+#define DEF_INT_MASK			0x41fdf
+
+#define RX_DESC_NUM			64
+#define TX_DESC_NUM			64
+#define TX_NEXT(N)			(((N) + 1) & (TX_DESC_NUM-1))
+#define RX_NEXT(N)			(((N) + 1) & (RX_DESC_NUM-1))
+
+#define GMAC_PPE_RX_PKT_MAX_LEN		379
+#define GMAC_MAX_PKT_LEN		1516
+#define DESC_DEF_CFG			0x14
+#define RX_BUF_SIZE			1600
+#define RX_PKT_ERR			0x3
+#define TX_TIMEOUT			(6 * HZ)
+
+#define DRV_NAME			"hip04-ether"
+
+struct tx_desc {
+	u32 send_addr;
+	u16 reserved_16;
+	u16 send_size;
+	u32 reserved_32;
+	u32 cfg;
+	u32 wb_addr;
+} ____cacheline_aligned;
+
+struct rx_desc {
+	u16 reserved_16;
+	u16 pkt_len;
+	u32 reserve1[3];
+	u32 pkt_err;
+	u32 reserve2[4];
+};
+
+struct hip04_priv {
+	void __iomem *base;
+	int phy_mode;
+	int id;
+	unsigned int port;
+	unsigned int speed;
+	unsigned int duplex;
+	unsigned int reg_inten;
+
+	struct napi_struct napi;
+	struct net_device *ndev;
+
+	struct tx_desc *tx_desc;
+	dma_addr_t tx_desc_dma;
+	struct sk_buff *tx_skb[TX_DESC_NUM];
+	dma_addr_t tx_phys[TX_DESC_NUM];
+	unsigned int tx_head;
+	unsigned int tx_tail;
+	unsigned int tx_count;
+
+	unsigned char *rx_buf[RX_DESC_NUM];
+	dma_addr_t rx_phys[RX_DESC_NUM];
+	unsigned int rx_head;
+	unsigned int rx_buf_size;
+
+	struct device_node *phy_node;
+	struct phy_device *phy;
+};
+
+static void __iomem *ppebase;
+
+static void hip04_config_port(struct hip04_priv *priv, u32 speed, u32 duplex)
+{
+	u32 val;
+
+	priv->speed = speed;
+	priv->duplex = duplex;
+
+	switch (priv->phy_mode) {
+	case PHY_INTERFACE_MODE_SGMII:
+		if (speed == SPEED_1000)
+			val = 8;
+		else
+			val = 7;
+		break;
+	case PHY_INTERFACE_MODE_MII:
+		val = 1;	/* SPEED_100 */
+		break;
+	default:
+		val = 0;
+		break;
+	}
+	writel_relaxed(val, priv->base + GE_PORT_MODE);
+
+	val = (duplex) ? BIT(0) : 0;
+	writel_relaxed(val, priv->base + GE_DUPLEX_TYPE);
+
+	val = BIT(0);
+	writel_relaxed(val, priv->base + GE_MODE_CHANGE_EN);
+}
+
+static void hip04_reset_ppe(struct hip04_priv *priv)
+{
+	u32 val;
+
+	do {
+		val =
+		readl_relaxed(ppebase + priv->port * 4 + PPE_CURR_BUF_CNT_REG);
+		readl_relaxed(ppebase + priv->port * 4 + PPE_CFG_RX_CFF_ADDR);
+	} while (val & 0xfff);
+}
+
+static void hip04_config_fifo(struct hip04_priv *priv)
+{
+	u32 val;
+
+	val = readl_relaxed(priv->base + PPE_CFG_STS_MODE);
+	val |= BIT(12);			/* PPE_HIS_RX_PKT_CNT read clear */
+	writel_relaxed(val, priv->base + PPE_CFG_STS_MODE);
+
+	val = BIT(priv->port);
+	writel_relaxed(val, ppebase + priv->port * 4 + PPE_CFG_POOL_GRP);
+
+	val = priv->port << 8;
+	val |= BIT(14);
+	writel_relaxed(val, priv->base + PPE_CFG_QOS_VMID_GEN);
+
+	val = RX_BUF_SIZE;
+	writel_relaxed(val, ppebase + priv->port * 4 + PPE_CFG_RX_BUF_SIZE);
+
+	val = RX_DESC_NUM << 16;	/* depth */
+	val |= BIT(11);			/* seq: first set first ues */
+	val |= RX_DESC_NUM * priv->id;	/* start_addr */
+	writel_relaxed(val, ppebase + priv->port * 4 + PPE_CFG_RX_FIFO_SIZE);
+
+	/* pkt store format */
+	val = NET_IP_ALIGN << 11;	/* align */
+	writel_relaxed(val, priv->base + PPE_CFG_RX_CTRL_REG);
+
+	/* following cfg required for 1000M */
+	/* pkt mode */
+	val = BIT(18);			/* align */
+	writel_relaxed(val, priv->base + PPE_CFG_RX_PKT_MODE_REG);
+
+	/* set bus ctrl */
+	val = BIT(14);			/* buffer locally release */
+	val |= BIT(0);			/* big endian */
+	writel_relaxed(val, priv->base + PPE_CFG_BUS_CTRL_REG);
+
+	/* set max pkt len, curtail if exceed */
+	val = GMAC_PPE_RX_PKT_MAX_LEN;	/* max buffer len */
+	writel_relaxed(val, priv->base + PPE_CFG_MAX_FRAME_LEN_REG);
+
+	/* set max len of each pkt */
+	val = GMAC_MAX_PKT_LEN;		/* max buffer len */
+	writel_relaxed(val, priv->base + GE_MAX_FRM_SIZE_REG);
+
+	/* set min len of each pkt */
+	val = 31;			/* min buffer len */
+	writel_relaxed(val, priv->base + GE_SHORT_RUNTS_THR_REG);
+
+	/* tx */
+	val = readl_relaxed(priv->base + GE_TRANSMIT_CONTROL_REG);
+	val |= BIT(5);			/* tx auto neg */
+	val |= BIT(6);			/* tx add crc */
+	val |= BIT(7);			/* tx short pad through */
+	writel_relaxed(val, priv->base + GE_TRANSMIT_CONTROL_REG);
+
+	/* rx crc */
+	val = BIT(0);			/* rx strip crc */
+	writel_relaxed(val, priv->base + GE_CF_CRC_STRIP_REG);
+
+	/* rx */
+	val = readl_relaxed(priv->base + GE_RECV_CONTROL_REG);
+	val |= BIT(3);			/* rx strip pad */
+	val |= BIT(4);			/* run pkt en */
+	writel_relaxed(val, priv->base + GE_RECV_CONTROL_REG);
+
+	/* auto neg control */
+	val = BIT(0);
+	writel_relaxed(val, priv->base + GE_TX_LOCAL_PAGE_REG);
+}
+
+static void hip04_mac_enable(struct net_device *ndev, bool enable)
+{
+	struct hip04_priv *priv = netdev_priv(ndev);
+	u32 val;
+
+	if (enable) {
+		/* enable tx & rx */
+		val = readl_relaxed(priv->base + GE_PORT_EN);
+		val |= BIT(1);		/* rx*/
+		val |= BIT(2);		/* tx*/
+		writel_relaxed(val, priv->base + GE_PORT_EN);
+
+		/* enable interrupt */
+		priv->reg_inten = DEF_INT_MASK;
+		writel_relaxed(priv->reg_inten, priv->base + PPE_INTEN);
+
+		/* clear rx int */
+		val = RCV_INT;
+		writel_relaxed(val, priv->base + PPE_RINT);
+
+		/* config recv int*/
+		val = BIT(6);		/* int threshold 1 package */
+		val |= 0x4;		/* recv timeout */
+		writel_relaxed(val, priv->base + PPE_CFG_RX_PKT_INT);
+	} else {
+		/* disable int */
+		priv->reg_inten &= ~(RCV_INT | RCV_NOBUF);
+		writel_relaxed(priv->reg_inten, priv->base + PPE_INTEN);
+
+		/* disable tx & rx */
+		val = readl_relaxed(priv->base + GE_PORT_EN);
+		val &= ~(BIT(1));	/* rx*/
+		val &= ~(BIT(2));	/* tx*/
+		writel_relaxed(val, priv->base + GE_PORT_EN);
+	}
+}
+
+static void hip04_set_xmit_desc(struct hip04_priv *priv, dma_addr_t phys)
+{
+	writel(phys, priv->base + PPE_CFG_TX_PKT_BD_ADDR);
+}
+
+static void hip04_set_recv_desc(struct hip04_priv *priv, dma_addr_t phys)
+{
+	writel(phys, ppebase + priv->port * 4 + PPE_CFG_RX_CFF_ADDR);
+}
+
+static u32 hip04_recv_cnt(struct hip04_priv *priv)
+{
+	return readl(priv->base + PPE_HIS_RX_PKT_CNT);
+}
+
+static void hip04_update_mac_address(struct net_device *ndev)
+{
+	struct hip04_priv *priv = netdev_priv(ndev);
+
+	writel_relaxed(((ndev->dev_addr[0] << 8) | (ndev->dev_addr[1])),
+			priv->base + GE_STATION_MAC_ADDRESS);
+	writel_relaxed(((ndev->dev_addr[2] << 24) | (ndev->dev_addr[3] << 16) |
+			(ndev->dev_addr[4] << 8) | (ndev->dev_addr[5])),
+			priv->base + GE_STATION_MAC_ADDRESS + 4);
+}
+
+static int hip04_set_mac_address(struct net_device *ndev, void *addr)
+{
+	eth_mac_addr(ndev, addr);
+	hip04_update_mac_address(ndev);
+	return 0;
+}
+
+static int hip04_rx_poll(struct napi_struct *napi, int budget)
+{
+	struct hip04_priv *priv = container_of(napi, struct hip04_priv, napi);
+	struct net_device *ndev = priv->ndev;
+	struct net_device_stats *stats = &ndev->stats;
+	unsigned int cnt = hip04_recv_cnt(priv);
+	struct sk_buff *skb;
+	struct rx_desc *desc;
+	unsigned char *buf;
+	dma_addr_t phys;
+	int rx = 0;
+	u16 len;
+	u32 err;
+
+	while (cnt) {
+		buf = priv->rx_buf[priv->rx_head];
+		skb = build_skb(buf, priv->rx_buf_size);
+		if (unlikely(!skb))
+			net_dbg_ratelimited("build_skb failed\n");
+
+		dma_unmap_single(&ndev->dev, priv->rx_phys[priv->rx_head],
+				RX_BUF_SIZE, DMA_FROM_DEVICE);
+		priv->rx_phys[priv->rx_head] = 0;
+
+		desc = (struct rx_desc *)skb->data;
+		len = be16_to_cpu(desc->pkt_len);
+		err = be32_to_cpu(desc->pkt_err);
+
+		if (len > RX_BUF_SIZE)
+			len = RX_BUF_SIZE;
+		if (0 == len)
+			break;
+
+		if (err & RX_PKT_ERR) {
+			dev_kfree_skb_any(skb);
+			stats->rx_dropped++;
+			stats->rx_errors++;
+			continue;
+		}
+
+		stats->rx_packets++;
+		stats->rx_bytes += len;
+
+		skb_reserve(skb, NET_SKB_PAD + NET_IP_ALIGN);
+		skb_put(skb, len);
+		skb->protocol = eth_type_trans(skb, ndev);
+		napi_gro_receive(&priv->napi, skb);
+
+		buf = netdev_alloc_frag(priv->rx_buf_size);
+		if (!buf)
+			return -ENOMEM;
+		phys = dma_map_single(&ndev->dev, buf,
+				RX_BUF_SIZE, DMA_FROM_DEVICE);
+		priv->rx_buf[priv->rx_head] = buf;
+		priv->rx_phys[priv->rx_head] = phys;
+		hip04_set_recv_desc(priv, phys);
+
+		priv->rx_head = RX_NEXT(priv->rx_head);
+		if (rx++ >= budget)
+			break;
+
+		if (--cnt == 0)
+			cnt = hip04_recv_cnt(priv);
+	}
+
+	if (rx < budget) {
+		napi_complete(napi);
+
+		/* enable rx interrupt */
+		priv->reg_inten |= RCV_INT | RCV_NOBUF;
+		writel(priv->reg_inten, priv->base + PPE_INTEN);
+	}
+
+	return rx;
+}
+
+static irqreturn_t hip04_mac_interrupt(int irq, void *dev_id)
+{
+	struct net_device *ndev = (struct net_device *) dev_id;
+	struct hip04_priv *priv = netdev_priv(ndev);
+	u32 ists = readl_relaxed(priv->base + PPE_INTSTS);
+	u32 val = DEF_INT_MASK;
+
+	writel_relaxed(val, priv->base + PPE_RINT);
+
+	if (ists & (RCV_INT | RCV_NOBUF)) {
+		if (napi_schedule_prep(&priv->napi)) {
+			/* disable rx interrupt */
+			priv->reg_inten &= ~(RCV_INT | RCV_NOBUF);
+			writel_relaxed(priv->reg_inten, priv->base + PPE_INTEN);
+			__napi_schedule(&priv->napi);
+		}
+	}
+
+	return IRQ_HANDLED;
+}
+
+static void hip04_tx_reclaim(struct net_device *ndev, bool force)
+{
+	struct hip04_priv *priv = netdev_priv(ndev);
+	unsigned tx_head = priv->tx_head;
+	unsigned tx_tail = priv->tx_tail;
+	struct tx_desc *desc = &priv->tx_desc[priv->tx_tail];
+
+	while (tx_tail != tx_head) {
+		if (desc->send_addr != 0) {
+			if (force)
+				desc->send_addr = 0;
+			else
+				break;
+		}
+		if (priv->tx_phys[tx_tail]) {
+			dma_unmap_single(&ndev->dev, priv->tx_phys[tx_tail],
+				priv->tx_skb[tx_tail]->len, DMA_TO_DEVICE);
+			priv->tx_phys[tx_tail] = 0;
+		}
+		dev_kfree_skb_irq(priv->tx_skb[tx_tail]);
+		priv->tx_skb[tx_tail] = NULL;
+		tx_tail = TX_NEXT(tx_tail);
+		priv->tx_count--;
+	}
+	priv->tx_tail = tx_tail;
+}
+
+static int hip04_mac_start_xmit(struct sk_buff *skb, struct net_device *ndev)
+{
+	struct hip04_priv *priv = netdev_priv(ndev);
+	struct net_device_stats *stats = &ndev->stats;
+	unsigned int tx_head = priv->tx_head;
+	struct tx_desc *desc = &priv->tx_desc[tx_head];
+	dma_addr_t phys;
+
+	hip04_tx_reclaim(ndev, false);
+
+	if (priv->tx_count++ >= TX_DESC_NUM) {
+		net_dbg_ratelimited("no TX space for packet\n");
+		netif_stop_queue(ndev);
+		return NETDEV_TX_BUSY;
+	}
+
+	phys = dma_map_single(&ndev->dev, skb->data, skb->len, DMA_TO_DEVICE);
+	priv->tx_skb[tx_head] = skb;
+	priv->tx_phys[tx_head] = phys;
+	desc->send_addr = cpu_to_be32(phys);
+	desc->send_size = cpu_to_be16(skb->len);
+	desc->cfg = cpu_to_be32(DESC_DEF_CFG);
+	phys = priv->tx_desc_dma + tx_head * sizeof(struct tx_desc);
+	desc->wb_addr = cpu_to_be32(phys);
+	skb_tx_timestamp(skb);
+	hip04_set_xmit_desc(priv, phys);
+	priv->tx_head = TX_NEXT(tx_head);
+
+	stats->tx_bytes += skb->len;
+	stats->tx_packets++;
+
+	return NETDEV_TX_OK;
+}
+
+static void hip04_adjust_link(struct net_device *ndev)
+{
+	struct hip04_priv *priv = netdev_priv(ndev);
+	struct phy_device *phy = priv->phy;
+
+	if ((priv->speed != phy->speed) || (priv->duplex != phy->duplex)) {
+		hip04_config_port(priv, phy->speed, phy->duplex);
+		phy_print_status(phy);
+	}
+}
+
+static int hip04_mac_open(struct net_device *ndev)
+{
+	struct hip04_priv *priv = netdev_priv(ndev);
+	int i;
+
+	priv->rx_head = 0;
+	priv->tx_head = 0;
+	priv->tx_tail = 0;
+	priv->tx_count = 0;
+
+	hip04_reset_ppe(priv);
+	for (i = 0; i < RX_DESC_NUM; i++) {
+		dma_addr_t phys;
+
+		phys = dma_map_single(&ndev->dev, priv->rx_buf[i],
+				RX_BUF_SIZE, DMA_FROM_DEVICE);
+		priv->rx_phys[i] = phys;
+		hip04_set_recv_desc(priv, phys);
+	}
+
+	if (priv->phy_node) {
+		priv->phy = of_phy_connect(ndev, priv->phy_node,
+			&hip04_adjust_link, 0, priv->phy_mode);
+		if (!priv->phy)
+			return -ENODEV;
+		phy_start(priv->phy);
+	}
+
+	netif_start_queue(ndev);
+	hip04_mac_enable(ndev, true);
+	napi_enable(&priv->napi);
+	return 0;
+}
+
+static int hip04_mac_stop(struct net_device *ndev)
+{
+	struct hip04_priv *priv = netdev_priv(ndev);
+	int i;
+
+	if (priv->phy)
+		phy_disconnect(priv->phy);
+
+	priv->phy = NULL;
+	napi_disable(&priv->napi);
+	netif_stop_queue(ndev);
+	hip04_mac_enable(ndev, false);
+	hip04_tx_reclaim(ndev, true);
+	hip04_reset_ppe(priv);
+
+	for (i = 0; i < RX_DESC_NUM; i++) {
+		if (priv->rx_phys[i]) {
+			dma_unmap_single(&ndev->dev, priv->rx_phys[i],
+					RX_BUF_SIZE, DMA_FROM_DEVICE);
+			priv->rx_phys[i] = 0;
+		}
+	}
+
+	return 0;
+}
+
+static void hip04_timeout(struct net_device *ndev)
+{
+	netif_wake_queue(ndev);
+	return;
+}
+
+static struct net_device_ops hip04_netdev_ops = {
+	.ndo_open		= hip04_mac_open,
+	.ndo_stop		= hip04_mac_stop,
+	.ndo_start_xmit		= hip04_mac_start_xmit,
+	.ndo_set_mac_address	= hip04_set_mac_address,
+	.ndo_tx_timeout         = hip04_timeout,
+	.ndo_validate_addr	= eth_validate_addr,
+	.ndo_change_mtu		= eth_change_mtu,
+};
+
+static int hip04_alloc_ring(struct net_device *ndev, struct device *d)
+{
+	struct hip04_priv *priv = netdev_priv(ndev);
+	int i;
+
+	priv->tx_desc = dma_alloc_coherent(d,
+			TX_DESC_NUM * sizeof(struct tx_desc),
+			&priv->tx_desc_dma, GFP_KERNEL);
+	if (!priv->tx_desc)
+		return -ENOMEM;
+
+	priv->rx_buf_size = RX_BUF_SIZE +
+			    SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
+	for (i = 0; i < RX_DESC_NUM; i++) {
+		priv->rx_buf[i] = netdev_alloc_frag(priv->rx_buf_size);
+		if (!priv->rx_buf[i])
+			return -ENOMEM;
+	}
+
+	return 0;
+}
+
+static void hip04_free_ring(struct net_device *ndev, struct device *d)
+{
+	struct hip04_priv *priv = netdev_priv(ndev);
+	int i;
+
+	for (i = 0; i < RX_DESC_NUM; i++)
+		if (priv->rx_buf[i])
+			put_page(virt_to_head_page(priv->rx_buf[i]));
+
+	for (i = 0; i < TX_DESC_NUM; i++)
+		if (priv->tx_skb[i])
+			dev_kfree_skb_any(priv->tx_skb[i]);
+
+	dma_free_coherent(d, TX_DESC_NUM * sizeof(struct tx_desc),
+			priv->tx_desc, priv->tx_desc_dma);
+}
+
+static int hip04_mac_probe(struct platform_device *pdev)
+{
+	struct device *d = &pdev->dev;
+	struct device_node *n, *node = d->of_node;
+	struct net_device *ndev;
+	struct hip04_priv *priv;
+	struct resource *res;
+	unsigned int irq;
+	int ret;
+
+	ndev = alloc_etherdev(sizeof(struct hip04_priv));
+	if (!ndev)
+		return -ENOMEM;
+
+	priv = netdev_priv(ndev);
+	priv->ndev = ndev;
+	platform_set_drvdata(pdev, ndev);
+
+	res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
+	priv->base = devm_ioremap_resource(d, res);
+	if (IS_ERR(priv->base)) {
+		ret = PTR_ERR(priv->base);
+		goto init_fail;
+	}
+
+	if (!ppebase) {
+		n = of_find_compatible_node(NULL, NULL, "hisilicon,hip04-ppe");
+		if (!n) {
+			ret = -EINVAL;
+			netdev_err(ndev, "not find hisilicon,ppe\n");
+			goto init_fail;
+		}
+		ppebase = of_iomap(n, 0);
+	}
+
+	n = of_parse_phandle(node, "port-handle", 0);
+	if (n) {
+		ret = of_property_read_u32(n, "reg", &priv->port);
+		if (ret) {
+			dev_warn(d, "no reg info\n");
+			goto init_fail;
+		}
+	} else {
+		dev_warn(d, "no port-handle\n");
+		ret = -EINVAL;
+		goto init_fail;
+	}
+
+	priv->id = of_alias_get_id(node, "ethernet");
+	if (priv->id < 0) {
+		dev_warn(d, "no ethernet alias\n");
+		ret = -EINVAL;
+		goto init_fail;
+	}
+
+	priv->phy_mode = of_get_phy_mode(node);
+	if (priv->phy_mode < 0) {
+		dev_warn(d, "not find phy-mode\n");
+		ret = -EINVAL;
+		goto init_fail;
+	}
+
+	irq = platform_get_irq(pdev, 0);
+	if (irq <= 0) {
+		ret = -EINVAL;
+		goto init_fail;
+	}
+
+	ether_setup(ndev);
+	ndev->netdev_ops = &hip04_netdev_ops;
+	ndev->watchdog_timeo = TX_TIMEOUT;
+	ndev->priv_flags |= IFF_UNICAST_FLT;
+	ndev->irq = irq;
+	netif_napi_add(ndev, &priv->napi, hip04_rx_poll, RX_DESC_NUM);
+	SET_NETDEV_DEV(ndev, &pdev->dev);
+
+	hip04_reset_ppe(priv);
+	if (priv->phy_mode == PHY_INTERFACE_MODE_MII)
+		hip04_config_port(priv, SPEED_100, DUPLEX_FULL);
+
+	hip04_config_fifo(priv);
+	random_ether_addr(ndev->dev_addr);
+	hip04_update_mac_address(ndev);
+
+	ret = hip04_alloc_ring(ndev, d);
+	if (ret) {
+		netdev_err(ndev, "alloc ring fail\n");
+		goto alloc_fail;
+	}
+
+	ret = devm_request_irq(d, irq, hip04_mac_interrupt,
+				0, pdev->name, ndev);
+	if (ret) {
+		netdev_err(ndev, "devm_request_irq failed\n");
+		goto alloc_fail;
+	}
+
+	priv->phy_node = of_parse_phandle(node, "phy-handle", 0);
+
+	ret = register_netdev(ndev);
+	if (ret) {
+		free_netdev(ndev);
+		goto alloc_fail;
+	}
+
+	return 0;
+alloc_fail:
+	hip04_free_ring(ndev, d);
+init_fail:
+	of_node_put(priv->phy_node);
+	free_netdev(ndev);
+	return ret;
+}
+
+static int hip04_remove(struct platform_device *pdev)
+{
+	struct net_device *ndev = platform_get_drvdata(pdev);
+	struct hip04_priv *priv = netdev_priv(ndev);
+	struct device *d = &pdev->dev;
+
+	hip04_free_ring(ndev, d);
+	unregister_netdev(ndev);
+	free_irq(ndev->irq, ndev);
+	of_node_put(priv->phy_node);
+	free_netdev(ndev);
+
+	return 0;
+}
+
+static const struct of_device_id hip04_mac_match[] = {
+	{ .compatible = "hisilicon,hip04-mac" },
+	{ }
+};
+
+static struct platform_driver hip04_mac_driver = {
+	.probe	= hip04_mac_probe,
+	.remove	= hip04_remove,
+	.driver	= {
+		.name		= DRV_NAME,
+		.owner		= THIS_MODULE,
+		.of_match_table	= hip04_mac_match,
+	},
+};
+module_platform_driver(hip04_mac_driver);
+
+MODULE_DESCRIPTION("HISILICON P04 Ethernet driver");
+MODULE_LICENSE("GPL v2");
+MODULE_ALIAS("platform:hip04-ether");
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 148+ messages in thread

end of thread, other threads:[~2014-04-18 13:18 UTC | newest]

Thread overview: 148+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-03-18  8:40 [PATCH 0/3] add hisilicon hip04 ethernet driver Zhangfei Gao
2014-03-18  8:40 ` Zhangfei Gao
2014-03-18  8:40 ` [PATCH 1/3] Documentation: add Device tree bindings for Hisilicon hip04 ethernet Zhangfei Gao
2014-03-18  8:40   ` Zhangfei Gao
2014-03-18 12:34   ` Mark Rutland
2014-03-18 12:34     ` Mark Rutland
     [not found]     ` <20140318123451.GA2941-NuALmloUBlrZROr8t4l/smS4ubULX0JqMm0uRHvK7Nw@public.gmane.org>
2014-03-21 12:59       ` Zhangfei Gao
2014-03-21 12:59         ` Zhangfei Gao
2014-03-18 12:51   ` Sergei Shtylyov
2014-03-18 12:51     ` Sergei Shtylyov
2014-03-21 13:04     ` Zhangfei Gao
2014-03-21 13:04       ` Zhangfei Gao
2014-03-18 17:39   ` Florian Fainelli
2014-03-18 17:39     ` Florian Fainelli
2014-03-20 11:29     ` Zhangfei Gao
2014-03-20 11:29       ` Zhangfei Gao
2014-03-18  8:40 ` [PATCH 2/3] net: hisilicon: new hip04 MDIO driver Zhangfei Gao
2014-03-18  8:40   ` Zhangfei Gao
     [not found]   ` <1395132017-15928-3-git-send-email-zhangfei.gao-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org>
2014-03-18 17:28     ` Florian Fainelli
2014-03-18 17:28       ` Florian Fainelli
2014-03-20 10:53       ` Zhangfei Gao
2014-03-20 10:53         ` Zhangfei Gao
2014-03-20 17:59         ` Florian Fainelli
2014-03-20 17:59           ` Florian Fainelli
2014-03-21  5:27           ` Zhangfei Gao
2014-03-21  5:27             ` Zhangfei Gao
     [not found] ` <1395132017-15928-1-git-send-email-zhangfei.gao-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org>
2014-03-18  8:40   ` [PATCH 3/3] net: hisilicon: new hip04 ethernet driver Zhangfei Gao
2014-03-18  8:40     ` Zhangfei Gao
     [not found]     ` <1395132017-15928-4-git-send-email-zhangfei.gao-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org>
2014-03-18 10:46       ` Russell King - ARM Linux
2014-03-18 10:46         ` Russell King - ARM Linux
2014-03-20  9:51         ` Zhangfei Gao
2014-03-20  9:51           ` Zhangfei Gao
2014-03-24 14:17           ` Rob Herring
2014-03-24 14:17             ` Rob Herring
2014-03-26 14:22             ` Zhangfei Gao
2014-03-26 14:22               ` Zhangfei Gao
2014-03-18 11:25     ` Arnd Bergmann
2014-03-18 11:25       ` Arnd Bergmann
2014-03-20 14:00       ` Zhangfei Gao
2014-03-20 14:00         ` Zhangfei Gao
2014-03-20 14:31         ` Arnd Bergmann
2014-03-20 14:31           ` Arnd Bergmann
     [not found]           ` <201403201531.20416.arnd-r2nGTMty4D4@public.gmane.org>
2014-03-21  5:19             ` Zhangfei Gao
2014-03-21  5:19               ` Zhangfei Gao
2014-03-21  7:37               ` Arnd Bergmann
2014-03-21  7:37                 ` Arnd Bergmann
2014-03-21  7:56                 ` Zhangfei Gao
2014-03-21  7:56                   ` Zhangfei Gao
2014-03-24  8:17           ` Zhangfei Gao
2014-03-24  8:17             ` Zhangfei Gao
2014-03-24 10:02             ` Arnd Bergmann
2014-03-24 10:02               ` Arnd Bergmann
2014-03-24 13:23               ` Zhangfei Gao
2014-03-24 13:23                 ` Zhangfei Gao
2014-03-18 10:27 ` [PATCH 0/3] add hisilicon " Ding Tianhong
2014-03-18 10:27   ` Ding Tianhong
2014-03-21 15:09 [PATCH v2 " Zhangfei Gao
2014-03-21 15:09 ` [PATCH 3/3] net: hisilicon: new " Zhangfei Gao
2014-03-21 15:09   ` Zhangfei Gao
2014-03-21 15:27   ` Arnd Bergmann
2014-03-21 15:27     ` Arnd Bergmann
2014-03-22  1:18     ` zhangfei
2014-03-22  1:18       ` zhangfei
2014-03-22  8:08       ` Arnd Bergmann
2014-03-22  8:08         ` Arnd Bergmann
2014-03-24 14:14 [PATCH v3 0/3] add hisilicon " Zhangfei Gao
2014-03-24 14:14 ` [PATCH 3/3] net: hisilicon: new " Zhangfei Gao
2014-03-24 14:14   ` Zhangfei Gao
     [not found]   ` <1395670496-17381-4-git-send-email-zhangfei.gao-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org>
2014-03-24 15:18     ` Arnd Bergmann
2014-03-24 15:18       ` Arnd Bergmann
2014-03-25  4:06       ` Zhangfei Gao
2014-03-25  4:06         ` Zhangfei Gao
2014-03-25  8:12         ` Arnd Bergmann
2014-03-25  8:12           ` Arnd Bergmann
2014-03-25 17:00           ` Florian Fainelli
2014-03-25 17:00             ` Florian Fainelli
2014-03-25 17:05             ` Arnd Bergmann
2014-03-25 17:05               ` Arnd Bergmann
2014-03-25 17:16               ` Florian Fainelli
2014-03-25 17:16                 ` Florian Fainelli
2014-03-25 17:57                 ` Arnd Bergmann
2014-03-25 17:57                   ` Arnd Bergmann
2014-03-26  9:55                   ` David Laight
2014-03-26  9:55                     ` David Laight
2014-03-25 17:17               ` David Laight
2014-03-25 17:17                 ` David Laight
2014-03-25 17:21               ` Eric Dumazet
2014-03-25 17:21                 ` Eric Dumazet
2014-03-25 17:54                 ` Arnd Bergmann
2014-03-25 17:54                   ` Arnd Bergmann
2014-03-27 12:53                   ` zhangfei
2014-03-27 12:53                     ` zhangfei
2014-03-24 16:32   ` Florian Fainelli
2014-03-24 16:32     ` Florian Fainelli
2014-03-24 17:23     ` Arnd Bergmann
2014-03-24 17:23       ` Arnd Bergmann
2014-03-24 17:35       ` Florian Fainelli
2014-03-24 17:35         ` Florian Fainelli
2014-03-27  6:27     ` Zhangfei Gao
2014-03-27  6:27       ` Zhangfei Gao
2014-03-28 15:35 [PATCH v4 0/3] add hisilicon " Zhangfei Gao
2014-03-28 15:36 ` [PATCH 3/3] net: hisilicon: new " Zhangfei Gao
2014-03-28 15:36   ` Zhangfei Gao
2014-04-01 13:27 [PATCH v5 0/3] add hisilicon " Zhangfei Gao
2014-04-01 13:27 ` [PATCH 3/3] net: hisilicon: new " Zhangfei Gao
2014-04-01 13:27   ` Zhangfei Gao
     [not found]   ` <1396358832-15828-4-git-send-email-zhangfei.gao-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org>
2014-04-02  9:21     ` Arnd Bergmann
2014-04-02  9:21       ` Arnd Bergmann
2014-04-02  9:51       ` zhangfei
2014-04-02  9:51         ` zhangfei
2014-04-02 15:24         ` Arnd Bergmann
2014-04-02 15:24           ` Arnd Bergmann
2014-04-02 10:04       ` David Laight
2014-04-02 10:04         ` David Laight
2014-04-02 15:49         ` Arnd Bergmann
2014-04-02 15:49           ` Arnd Bergmann
2014-04-03  6:24           ` Zhangfei Gao
2014-04-03  6:24             ` Zhangfei Gao
     [not found]             ` <CAMj5BkgfwE1hHpVeqH9WRitwCB30x3c4w0qw7sXT3PiOV-QcPQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2014-04-03  8:35               ` Arnd Bergmann
2014-04-03  8:35                 ` Arnd Bergmann
2014-04-03 15:22         ` David Miller
2014-04-03 15:22           ` David Miller
2014-04-03 15:38         ` zhangfei
2014-04-03 15:38           ` zhangfei
2014-04-03 15:27       ` Russell King - ARM Linux
2014-04-03 15:27         ` Russell King - ARM Linux
2014-04-03 15:42         ` David Laight
2014-04-03 15:42           ` David Laight
2014-04-03 15:50           ` Russell King - ARM Linux
2014-04-03 15:50             ` Russell King - ARM Linux
2014-04-03 17:57         ` Arnd Bergmann
2014-04-03 17:57           ` Arnd Bergmann
2014-04-04  6:52         ` Zhangfei Gao
2014-04-04  6:52           ` Zhangfei Gao
2014-04-04 15:16 [PATCH v6 0/3] add hisilicon " Zhangfei Gao
     [not found] ` <1396624597-390-1-git-send-email-zhangfei.gao-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org>
2014-04-04 15:16   ` [PATCH 3/3] net: hisilicon: new " Zhangfei Gao
2014-04-04 15:16     ` Zhangfei Gao
2014-04-05  4:35 [PATCH v7 0/3] add hisilicon " Zhangfei Gao
2014-04-05  4:35 ` [PATCH 3/3] net: hisilicon: new " Zhangfei Gao
2014-04-05  4:35   ` Zhangfei Gao
2014-04-07 18:53   ` David Miller
2014-04-07 18:53     ` David Miller
2014-04-08  8:07     ` zhangfei
2014-04-08  8:07       ` zhangfei
2014-04-08  8:30       ` David Laight
2014-04-08  8:30         ` David Laight
     [not found]         ` <063D6719AE5E284EB5DD2968C1650D6D0F6F1434-VkEWCZq2GCInGFn1LkZF6NBPR1lH4CV8@public.gmane.org>
2014-04-08  9:42           ` Arnd Bergmann
2014-04-08  9:42             ` Arnd Bergmann
2014-04-08 14:47           ` zhangfei
2014-04-08 14:47             ` zhangfei
2014-04-18 13:17     ` zhangfei
2014-04-18 13:17       ` zhangfei
2014-04-07 18:56   ` David Miller
2014-04-07 18:56     ` David Miller

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.