All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/4 net-next] net: phy: add Generic Netlink switch configuration API
@ 2013-10-22 18:23 Florian Fainelli
  2013-10-22 18:23 ` [PATCH 1/4 net-next] net: phy: add Generic Netlink Ethernet " Florian Fainelli
                   ` (3 more replies)
  0 siblings, 4 replies; 41+ messages in thread
From: Florian Fainelli @ 2013-10-22 18:23 UTC (permalink / raw)
  To: netdev; +Cc: davem, s.hauer, nbd, blogic, jogo, gary, Florian Fainelli

Hi all,

This patchset aims at providing Linux with a simple and extensible Ethernet
switch configuration API. It comes with two drivers, one for Broadcom BCM53xx
aka Roboswitch and a simulation/fake one to help developpers.

These patches have been used in OpenWrt since 2008 to drive various
Ethernet switches commonly found in small home and office wireless routers.

The documentation part of the patch explains why this solution was developped
in contrast to using DSA, user-space or something else deeply in details, but
to summarize:

- DSA is Marvell centric and messes up with the actual Ethernet frames, its
  control path is scattered around bridge, ethtool and iproute...

- swconfig is focused on providing a standard control path for Ethernet
  switches out there from $vendor and is extensible without core kernel
  changes and user-space changes too thanks to its netlink interface

Florian Fainelli (3):
  net: phy: add Generic Netlink Ethernet switch configuration API
  tools: add Generic Netlink switch configuration tool
  net: phy: add fake switch driver

Jonas Gorski (1):
  net: phy: add Broadcom B53 switch driver

 Documentation/networking/swconfig.txt |  162 ++++
 MAINTAINERS                           |   11 +
 drivers/net/phy/Kconfig               |   16 +
 drivers/net/phy/Makefile              |    3 +
 drivers/net/phy/b53/Kconfig           |   25 +
 drivers/net/phy/b53/Makefile          |    8 +
 drivers/net/phy/b53/b53_common.c      | 1336 +++++++++++++++++++++++++++++++++
 drivers/net/phy/b53/b53_mdio.c        |  425 +++++++++++
 drivers/net/phy/b53/b53_phy_fixup.c   |   55 ++
 drivers/net/phy/b53/b53_priv.h        |  282 +++++++
 drivers/net/phy/b53/b53_regs.h        |  311 ++++++++
 drivers/net/phy/b53/b53_spi.c         |  329 ++++++++
 drivers/net/phy/swconfig-hwsim.c      |  230 ++++++
 drivers/net/phy/swconfig.c            | 1078 ++++++++++++++++++++++++++
 include/linux/platform_data/b53.h     |   32 +
 include/linux/swconfig.h              |  180 +++++
 include/uapi/linux/Kbuild             |    1 +
 include/uapi/linux/swconfig.h         |  103 +++
 tools/Makefile                        |   10 +-
 tools/swconfig/.gitignore             |    2 +
 tools/swconfig/Makefile               |   15 +
 tools/swconfig/cli.c                  |  328 ++++++++
 tools/swconfig/swlib.c                |  786 +++++++++++++++++++
 tools/swconfig/swlib.h                |  244 ++++++
 24 files changed, 5971 insertions(+), 1 deletion(-)
 create mode 100644 Documentation/networking/swconfig.txt
 create mode 100644 drivers/net/phy/b53/Kconfig
 create mode 100644 drivers/net/phy/b53/Makefile
 create mode 100644 drivers/net/phy/b53/b53_common.c
 create mode 100644 drivers/net/phy/b53/b53_mdio.c
 create mode 100644 drivers/net/phy/b53/b53_phy_fixup.c
 create mode 100644 drivers/net/phy/b53/b53_priv.h
 create mode 100644 drivers/net/phy/b53/b53_regs.h
 create mode 100644 drivers/net/phy/b53/b53_spi.c
 create mode 100644 drivers/net/phy/swconfig-hwsim.c
 create mode 100644 drivers/net/phy/swconfig.c
 create mode 100644 include/linux/platform_data/b53.h
 create mode 100644 include/linux/swconfig.h
 create mode 100644 include/uapi/linux/swconfig.h
 create mode 100644 tools/swconfig/.gitignore
 create mode 100644 tools/swconfig/Makefile
 create mode 100644 tools/swconfig/cli.c
 create mode 100644 tools/swconfig/swlib.c
 create mode 100644 tools/swconfig/swlib.h

-- 
1.8.3.2

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH 1/4 net-next] net: phy: add Generic Netlink Ethernet switch configuration API
  2013-10-22 18:23 [PATCH 0/4 net-next] net: phy: add Generic Netlink switch configuration API Florian Fainelli
@ 2013-10-22 18:23 ` Florian Fainelli
  2013-10-22 19:22   ` Dan Williams
  2013-10-22 19:53   ` John Fastabend
  2013-10-22 18:23 ` [PATCH 2/4 net-next] tools: add Generic Netlink switch configuration tool Florian Fainelli
                   ` (2 subsequent siblings)
  3 siblings, 2 replies; 41+ messages in thread
From: Florian Fainelli @ 2013-10-22 18:23 UTC (permalink / raw)
  To: netdev; +Cc: davem, s.hauer, nbd, blogic, jogo, gary, Florian Fainelli

This patch adds an Ethernet Switch generic netlink configuration API
which allows for doing the required configuration of managed Ethernet
switches commonly found in Wireless/Cable/DSL routers in the market.

Since this API is based on the Generic Netlink infrastructure it is very
easy to extend a particular switch driver to support additional features
and to adapt it to specific switches.

So far the API includes support for:

- getting/setting a port VLAN id
- getting/setting VLAN port membership
- getting a port link status
- getting a port statistics counters
- resetting a switch device
- applying a configuration to a switch device

Unlike the Distributed Switch Architecture code, this API is much
smaller and does not interfere with the networking stack packet flow, but
rather focuses on the control path of managed switches.

A concrete example of a switch driver is included in subsequent patches
to illustrate how it can be used as well as the required user-space
controlling tools.

Signed-off-by: Felix Fietkau <nbd@openwrt.org>
Signed-off-by: John Crispin <blogic@openwrt.org>
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
---
 Documentation/networking/swconfig.txt |  162 +++++
 MAINTAINERS                           |   10 +
 drivers/net/phy/Kconfig               |    6 +
 drivers/net/phy/Makefile              |    1 +
 drivers/net/phy/swconfig.c            | 1078 +++++++++++++++++++++++++++++++++
 include/linux/swconfig.h              |  180 ++++++
 include/uapi/linux/Kbuild             |    1 +
 include/uapi/linux/swconfig.h         |  103 ++++
 8 files changed, 1541 insertions(+)
 create mode 100644 Documentation/networking/swconfig.txt
 create mode 100644 drivers/net/phy/swconfig.c
 create mode 100644 include/linux/swconfig.h
 create mode 100644 include/uapi/linux/swconfig.h

diff --git a/Documentation/networking/swconfig.txt b/Documentation/networking/swconfig.txt
new file mode 100644
index 0000000..f560066
--- /dev/null
+++ b/Documentation/networking/swconfig.txt
@@ -0,0 +1,162 @@
+Generic Netlink Switch configuration API
+
+Introduction
+============
+
+The following documentation covers the Linux Ethernet switch configuration API
+which is based on the Generic Netlink infrastructure.
+
+Scope and rationale
+===================
+
+Most Ethernet switches found in small routers are managed switches which allow
+the following operations:
+
+- configure a port to belong to a particular set of VLANs either as tagged or
+  untagged
+- configure a particular port to advertise specific link/speed/duplex settings
+- collect statistics about the number of packets/bytes transferred/received
+- any other vendor specific feature: rate limiting, single/double tagging...
+
+Such switches can be connected to the controlling CPU using different hardware
+busses, but most commonly:
+
+- SPI/I2C/GPIO bitbanging
+- MDIO
+- Memory mapped into the CPU register address space
+
+As of today the usual way to configure such a switch was either to write a
+specific driver or to write an user-space application which would have to know
+about the hardware differences and figure out a way to access the switch
+registers (spidev, SIOCIGGMIIREG, mmap...) from user-space.
+
+This has multiple issues:
+
+- proliferation of ad-hoc solutions to configure a switch both open source and
+  proprietary
+
+- absence of common software reference for switches commonly found on the market
+  (Broadcom, Lantiq/Infineon/ADMTek, Marvell, Qualcomm/Atheros...) which implies
+  a duplication effort for each implementer
+
+- inability to leverage existing hardware representation mechanisms such as
+  Device Tree (spidev, i2c-dev.. do not belong in Device Tree and rely on
+  Linux-specific "forwarder" drivers) to describe a switch device
+
+The goal of the switch configuration API is to provide a common basis to build
+re-usable and extensible switch drivers with the following ideas in mind:
+
+- having a central point of configuration on top of which a reference user-space
+  implementation can be provided but also allow for other user-space
+  implementations to exist
+
+- ensure the Linux kernel is in control of the actual hardware access
+
+- be extensible enough to support per-switch features without making the generic
+  implementation too heavy weighted and without making user-space changes each
+  and every time a new feature is added
+
+Based on these design goals the Generic Netlink kernel/user-space communication
+mechanism was chosen because it allows for all design goals to be met.
+
+Distributed Switch Architecture vs. swconfig
+============================================
+
+The Marvell Distributed Switch Architecture drivers is an existing solution
+which is a heavy switch driver infrastructure, is Marvell centric, only
+supports MDIO connected switches, mangles an Ethernet driver transmit/receive
+paths and does not offer a central control path for the user.
+
+swconfig is vendor agnostic, does not mangle the transmit/receive path
+of an Ethernet driver and is focused on the control path of the switch rather
+that the data path. It is based on Generic Netlink to allow for each switch
+driver to easily extend the swconfig API without causing major core parts rework
+each and every time someone has a specific feature to implement and offers a
+central configuration point with a well-defined API.
+
+Switch configuration API
+========================
+
+The main data structure of the switch configuration API is a "struct switch_dev"
+which contains the following members:
+
+- a set of common operations to all switches (struct switch_dev_ops)
+- a network device pointer it is physically attached to
+- a number of physical switch ports (including CPU port)
+- a set of configured vlans
+- a CPU specific port index
+
+A particular switch device is registered/unregistered using the following pair
+of functions:
+
+register_switch(struct switch_dev *sw_dev, struct net_device *dev);
+unregister_switch(struct switch_dev);
+
+A given switch driver can be backed by any kind of underlying bus driver (i2c
+client, GPIO driver, MMIO driver, directly into the Ethernet MAC driver...).
+
+The set of common operations to all switches is represented by the "struct
+switch_dev_ops" function pointers, these common operations are defined as such:
+
+- get the port list of a VLAN identifier
+- set the port list of a VLAN identifier
+- get the primary VLAN identifier of a port
+- set the primary VLAN identifier of a port
+- apply the changed configuration to the switch
+- reset the switch
+- get a port link status
+- get a port statistics counters
+
+The switch_dev_ops structure also contains an extensible way of representing and
+querying switch specific features, 3 different types of attributes are
+available:
+
+- global attributes: attributes global to a switch (name, identifier, number of
+  ports)
+- port attributes: per-port specific attributes (MIB counters, enabling port
+  mirroring...)
+- vlan attributes: per-VLAN specific attributes (VLAN id, specific VLAN
+  information)
+
+Each of these 3 categories must be represented using an array of "struct
+switch_attr" attributes. This structure must be filed with:
+
+- an unique name for the operation
+- a description for the operation
+- a setter operation
+- a getter operation
+- a data type (string, integer, port)
+- eventual min/max limits to validate user input data
+
+The "struct switch_attr" directly maps to a Generic Netlink type of command and
+will be automatically discovered by the "swconfig" user-space utility without
+requiring user-space changes.
+
+User-space reference tool
+=========================
+
+A reference user-space implementation is provided in tools/swconfig in order to
+directly configure and use a particular switch driver. This reference
+implementation is linking against libnl-1 for the moment.
+
+To build it:
+
+make -C tools/swconfig
+
+To list the available switches:
+
+./tools/swconfig list
+
+And to show a particular switch configuration for instance:
+
+./tools/swconfig dev eth0 show
+
+Fake (simulation) switch driver
+===============================
+
+A fake switch driver called swconfig-hwsim is provided in order to allow for
+easy testing of API changes and to perform regression testing. This driver will
+automatically map to the loopback device and will create a fake switch of up to
+8 Gigabit ports. Each of these ports can be configured with separate
+speed/duplex/link settings. This driver is gated with the CONFIG_SWCONFIG_HWSIM
+configuration symbol.
diff --git a/MAINTAINERS b/MAINTAINERS
index f169259..3a54262 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -8117,6 +8117,16 @@ F:	lib/swiotlb.c
 F:	arch/*/kernel/pci-swiotlb.c
 F:	include/linux/swiotlb.h
 
+SWITCH CONFIGURATION API
+M:	Florian Fainelli <f.fainelli@gmail.com>
+L:	openwrt-devel@lists.openwrt.org
+L:	netdev@vger.kernel.org
+S:	Supported
+F:	drivers/net/ethernet/phy/swconfig*.c
+F:	include/uapi/linux/switch.h
+F:	include/linux/switch.h
+F:	Documentation/networking/swconfig.txt
+
 SYNOPSYS ARC ARCHITECTURE
 M:	Vineet Gupta <vgupta@synopsys.com>
 S:	Supported
diff --git a/drivers/net/phy/Kconfig b/drivers/net/phy/Kconfig
index 342561a..9b3e117 100644
--- a/drivers/net/phy/Kconfig
+++ b/drivers/net/phy/Kconfig
@@ -12,6 +12,12 @@ menuconfig PHYLIB
 
 if PHYLIB
 
+config SWCONFIG
+	tristate "Switch configuration API"
+	---help---
+	  Switch configuration API using netlink. This allows
+	  you to configure the VLAN features of certain switches.
+
 comment "MII PHY device drivers"
 
 config AT803X_PHY
diff --git a/drivers/net/phy/Makefile b/drivers/net/phy/Makefile
index 23a2ab2..268c7de 100644
--- a/drivers/net/phy/Makefile
+++ b/drivers/net/phy/Makefile
@@ -3,6 +3,7 @@
 libphy-objs			:= phy.o phy_device.o mdio_bus.o
 
 obj-$(CONFIG_PHYLIB)		+= libphy.o
+obj-$(CONFIG_SWCONFIG)		+= swconfig.o
 obj-$(CONFIG_MARVELL_PHY)	+= marvell.o
 obj-$(CONFIG_DAVICOM_PHY)	+= davicom.o
 obj-$(CONFIG_CICADA_PHY)	+= cicada.o
diff --git a/drivers/net/phy/swconfig.c b/drivers/net/phy/swconfig.c
new file mode 100644
index 0000000..9997c35
--- /dev/null
+++ b/drivers/net/phy/swconfig.c
@@ -0,0 +1,1078 @@
+/*
+ * Switch configuration API
+ *
+ * Copyright (C) 2008 Felix Fietkau <nbd@openwrt.org>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ */
+
+#include <linux/types.h>
+#include <linux/module.h>
+#include <linux/init.h>
+#include <linux/list.h>
+#include <linux/if.h>
+#include <linux/if_ether.h>
+#include <linux/capability.h>
+#include <linux/skbuff.h>
+#include <linux/swconfig.h>
+
+#define SWCONFIG_DEVNAME	"switch%d"
+
+MODULE_AUTHOR("Felix Fietkau <nbd@openwrt.org>");
+MODULE_LICENSE("GPL");
+
+static int swdev_id;
+static struct list_head swdevs;
+static DEFINE_SPINLOCK(swdevs_lock);
+struct swconfig_callback;
+
+struct swconfig_callback {
+	struct sk_buff *msg;
+	struct genlmsghdr *hdr;
+	struct genl_info *info;
+	int cmd;
+
+	/* callback for filling in the message data */
+	int (*fill)(struct swconfig_callback *cb, void *arg);
+
+	/* callback for closing the message before sending it */
+	int (*close)(struct swconfig_callback *cb, void *arg);
+
+	struct nlattr *nest[4];
+	int args[4];
+};
+
+/* defaults */
+
+static int
+swconfig_get_vlan_ports(struct switch_dev *dev,
+			const struct switch_attr *attr, struct switch_val *val)
+{
+	int ret;
+	if (val->port_vlan >= dev->vlans)
+		return -EINVAL;
+
+	if (!dev->ops->get_vlan_ports)
+		return -EOPNOTSUPP;
+
+	ret = dev->ops->get_vlan_ports(dev, val);
+	return ret;
+}
+
+static int
+swconfig_set_vlan_ports(struct switch_dev *dev,
+			const struct switch_attr *attr, struct switch_val *val)
+{
+	struct switch_port *ports = val->value.ports;
+	const struct switch_dev_ops *ops = dev->ops;
+	int i;
+
+	if (val->port_vlan >= dev->vlans)
+		return -EINVAL;
+
+	/* validate ports */
+	if (val->len > dev->ports)
+		return -EINVAL;
+
+	if (!ops->set_vlan_ports)
+		return -EOPNOTSUPP;
+
+	for (i = 0; i < val->len; i++) {
+		if (ports[i].id >= dev->ports)
+			return -EINVAL;
+
+		if (ops->set_port_pvid &&
+		    !(ports[i].flags & (1 << SWITCH_PORT_FLAG_TAGGED)))
+			ops->set_port_pvid(dev, ports[i].id, val->port_vlan);
+	}
+
+	return ops->set_vlan_ports(dev, val);
+}
+
+static int
+swconfig_set_pvid(struct switch_dev *dev,
+			const struct switch_attr *attr, struct switch_val *val)
+{
+	if (val->port_vlan >= dev->ports)
+		return -EINVAL;
+
+	if (!dev->ops->set_port_pvid)
+		return -EOPNOTSUPP;
+
+	return dev->ops->set_port_pvid(dev, val->port_vlan, val->value.i);
+}
+
+static int
+swconfig_get_pvid(struct switch_dev *dev,
+			const struct switch_attr *attr, struct switch_val *val)
+{
+	if (val->port_vlan >= dev->ports)
+		return -EINVAL;
+
+	if (!dev->ops->get_port_pvid)
+		return -EOPNOTSUPP;
+
+	return dev->ops->get_port_pvid(dev, val->port_vlan, &val->value.i);
+}
+
+static const char *
+swconfig_speed_str(enum switch_port_speed speed)
+{
+	switch (speed) {
+	case SWITCH_PORT_SPEED_10:
+		return "10baseT";
+	case SWITCH_PORT_SPEED_100:
+		return "100baseT";
+	case SWITCH_PORT_SPEED_1000:
+		return "1000baseT";
+	default:
+		break;
+	}
+
+	return "unknown";
+}
+
+static int
+swconfig_get_link(struct switch_dev *dev,
+			const struct switch_attr *attr, struct switch_val *val)
+{
+	struct switch_port_link link;
+	int len;
+	int ret;
+
+	if (val->port_vlan >= dev->ports)
+		return -EINVAL;
+
+	if (!dev->ops->get_port_link)
+		return -EOPNOTSUPP;
+
+	memset(&link, 0, sizeof(link));
+	ret = dev->ops->get_port_link(dev, val->port_vlan, &link);
+	if (ret)
+		return ret;
+
+	memset(dev->buf, 0, sizeof(dev->buf));
+
+	if (link.link)
+		len = snprintf(dev->buf, sizeof(dev->buf),
+			       "port:%d link:up speed:%s %s-duplex %s%s%s",
+			       val->port_vlan,
+			       swconfig_speed_str(link.speed),
+			       link.duplex ? "full" : "half",
+			       link.tx_flow ? "txflow " : "",
+			       link.rx_flow ?	"rxflow " : "",
+			       link.aneg ? "auto" : "");
+	else
+		len = snprintf(dev->buf, sizeof(dev->buf), "port:%d link:down",
+			       val->port_vlan);
+
+	val->value.s = dev->buf;
+	val->len = len;
+
+	return 0;
+}
+
+static int
+swconfig_apply_config(struct switch_dev *dev,
+			const struct switch_attr *attr, struct switch_val *val)
+{
+	/* don't complain if not supported by the switch driver */
+	if (!dev->ops->apply_config)
+		return 0;
+
+	return dev->ops->apply_config(dev);
+}
+
+static int
+swconfig_reset_switch(struct switch_dev *dev,
+			const struct switch_attr *attr, struct switch_val *val)
+{
+	/* don't complain if not supported by the switch driver */
+	if (!dev->ops->reset_switch)
+		return 0;
+
+	return dev->ops->reset_switch(dev);
+}
+
+enum global_defaults {
+	GLOBAL_APPLY,
+	GLOBAL_RESET,
+};
+
+enum vlan_defaults {
+	VLAN_PORTS,
+};
+
+enum port_defaults {
+	PORT_PVID,
+	PORT_LINK,
+};
+
+static struct switch_attr default_global[] = {
+	[GLOBAL_APPLY] = {
+		.type = SWITCH_TYPE_NOVAL,
+		.name = "apply",
+		.description = "Activate changes in the hardware",
+		.set = swconfig_apply_config,
+	},
+	[GLOBAL_RESET] = {
+		.type = SWITCH_TYPE_NOVAL,
+		.name = "reset",
+		.description = "Reset the switch",
+		.set = swconfig_reset_switch,
+	}
+};
+
+static struct switch_attr default_port[] = {
+	[PORT_PVID] = {
+		.type = SWITCH_TYPE_INT,
+		.name = "pvid",
+		.description = "Primary VLAN ID",
+		.set = swconfig_set_pvid,
+		.get = swconfig_get_pvid,
+	},
+	[PORT_LINK] = {
+		.type = SWITCH_TYPE_STRING,
+		.name = "link",
+		.description = "Get port link information",
+		.set = NULL,
+		.get = swconfig_get_link,
+	}
+};
+
+static struct switch_attr default_vlan[] = {
+	[VLAN_PORTS] = {
+		.type = SWITCH_TYPE_PORTS,
+		.name = "ports",
+		.description = "VLAN port mapping",
+		.set = swconfig_set_vlan_ports,
+		.get = swconfig_get_vlan_ports,
+	},
+};
+
+static const struct switch_attr *
+swconfig_find_attr_by_name(const struct switch_attrlist *alist,
+				const char *name)
+{
+	int i;
+
+	for (i = 0; i < alist->n_attr; i++)
+		if (strcmp(name, alist->attr[i].name) == 0)
+			return &alist->attr[i];
+
+	return NULL;
+}
+
+static void swconfig_defaults_init(struct switch_dev *dev)
+{
+	const struct switch_dev_ops *ops = dev->ops;
+
+	dev->def_global = 0;
+	dev->def_vlan = 0;
+	dev->def_port = 0;
+
+	if (ops->get_vlan_ports || ops->set_vlan_ports)
+		set_bit(VLAN_PORTS, &dev->def_vlan);
+
+	if (ops->get_port_pvid || ops->set_port_pvid)
+		set_bit(PORT_PVID, &dev->def_port);
+
+	if (ops->get_port_link &&
+	    !swconfig_find_attr_by_name(&ops->attr_port, "link"))
+		set_bit(PORT_LINK, &dev->def_port);
+
+	/* always present, can be no-op */
+	set_bit(GLOBAL_APPLY, &dev->def_global);
+	set_bit(GLOBAL_RESET, &dev->def_global);
+}
+
+
+static struct genl_family switch_fam = {
+	.id = GENL_ID_GENERATE,
+	.name = "switch",
+	.hdrsize = 0,
+	.version = 1,
+	.maxattr = SWITCH_ATTR_MAX,
+};
+
+static const struct nla_policy switch_policy[SWITCH_ATTR_MAX+1] = {
+	[SWITCH_ATTR_ID] = { .type = NLA_U32 },
+	[SWITCH_ATTR_OP_ID] = { .type = NLA_U32 },
+	[SWITCH_ATTR_OP_PORT] = { .type = NLA_U32 },
+	[SWITCH_ATTR_OP_VLAN] = { .type = NLA_U32 },
+	[SWITCH_ATTR_OP_VALUE_INT] = { .type = NLA_U32 },
+	[SWITCH_ATTR_OP_VALUE_STR] = { .type = NLA_NUL_STRING },
+	[SWITCH_ATTR_OP_VALUE_PORTS] = { .type = NLA_NESTED },
+	[SWITCH_ATTR_TYPE] = { .type = NLA_U32 },
+};
+
+static const struct nla_policy port_policy[SWITCH_PORT_ATTR_MAX+1] = {
+	[SWITCH_PORT_ID] = { .type = NLA_U32 },
+	[SWITCH_PORT_FLAG_TAGGED] = { .type = NLA_FLAG },
+};
+
+static inline void
+swconfig_lock(void)
+{
+	spin_lock(&swdevs_lock);
+}
+
+static inline void
+swconfig_unlock(void)
+{
+	spin_unlock(&swdevs_lock);
+}
+
+static struct switch_dev *
+swconfig_get_dev(struct genl_info *info)
+{
+	struct switch_dev *dev = NULL;
+	struct switch_dev *p;
+	int id;
+
+	if (!info->attrs[SWITCH_ATTR_ID])
+		goto done;
+
+	id = nla_get_u32(info->attrs[SWITCH_ATTR_ID]);
+	swconfig_lock();
+	list_for_each_entry(p, &swdevs, dev_list) {
+		if (id != p->id)
+			continue;
+
+		dev = p;
+		break;
+	}
+	if (dev)
+		mutex_lock(&dev->sw_mutex);
+	else
+		pr_debug("device %d not found\n", id);
+	swconfig_unlock();
+done:
+	return dev;
+}
+
+static inline void
+swconfig_put_dev(struct switch_dev *dev)
+{
+	mutex_unlock(&dev->sw_mutex);
+}
+
+static int
+swconfig_dump_attr(struct swconfig_callback *cb, void *arg)
+{
+	struct switch_attr *op = arg;
+	struct genl_info *info = cb->info;
+	struct sk_buff *msg = cb->msg;
+	int id = cb->args[0];
+	void *hdr;
+
+	hdr = genlmsg_put(msg, info->snd_portid, info->snd_seq, &switch_fam,
+			NLM_F_MULTI, SWITCH_CMD_NEW_ATTR);
+	if (IS_ERR(hdr))
+		return -1;
+
+	if (nla_put_u32(msg, SWITCH_ATTR_OP_ID, id))
+		goto nla_put_failure;
+	if (nla_put_u32(msg, SWITCH_ATTR_OP_TYPE, op->type))
+		goto nla_put_failure;
+	if (nla_put_string(msg, SWITCH_ATTR_OP_NAME, op->name))
+		goto nla_put_failure;
+	if (op->description)
+		if (nla_put_string(msg, SWITCH_ATTR_OP_DESCRIPTION,
+			op->description))
+			goto nla_put_failure;
+
+	return genlmsg_end(msg, hdr);
+nla_put_failure:
+	genlmsg_cancel(msg, hdr);
+	return -EMSGSIZE;
+}
+
+/* spread multipart messages across multiple message buffers */
+static int
+swconfig_send_multipart(struct swconfig_callback *cb, void *arg)
+{
+	struct genl_info *info = cb->info;
+	int restart = 0;
+	int err;
+
+	do {
+		if (!cb->msg) {
+			cb->msg = nlmsg_new(NLMSG_GOODSIZE, GFP_KERNEL);
+			if (cb->msg == NULL)
+				goto error;
+		}
+
+		if (!(cb->fill(cb, arg) < 0))
+			break;
+
+		/* fill failed, check if this was already the second attempt */
+		if (restart)
+			goto error;
+
+		/* try again in a new message, send the current one */
+		restart = 1;
+		if (cb->close) {
+			if (cb->close(cb, arg) < 0)
+				goto error;
+		}
+		err = genlmsg_reply(cb->msg, info);
+		cb->msg = NULL;
+		if (err < 0)
+			goto error;
+
+	} while (restart);
+
+	return 0;
+
+error:
+	if (cb->msg)
+		nlmsg_free(cb->msg);
+	return -1;
+}
+
+static int
+swconfig_list_attrs(struct sk_buff *skb, struct genl_info *info)
+{
+	struct genlmsghdr *hdr = nlmsg_data(info->nlhdr);
+	const struct switch_attrlist *alist;
+	struct switch_dev *dev;
+	struct swconfig_callback cb;
+	int err = -EINVAL;
+	int i;
+
+	/* defaults */
+	struct switch_attr *def_list;
+	unsigned long *def_active;
+	int n_def;
+
+	dev = swconfig_get_dev(info);
+	if (!dev)
+		return -EINVAL;
+
+	switch (hdr->cmd) {
+	case SWITCH_CMD_LIST_GLOBAL:
+		alist = &dev->ops->attr_global;
+		def_list = default_global;
+		def_active = &dev->def_global;
+		n_def = ARRAY_SIZE(default_global);
+		break;
+	case SWITCH_CMD_LIST_VLAN:
+		alist = &dev->ops->attr_vlan;
+		def_list = default_vlan;
+		def_active = &dev->def_vlan;
+		n_def = ARRAY_SIZE(default_vlan);
+		break;
+	case SWITCH_CMD_LIST_PORT:
+		alist = &dev->ops->attr_port;
+		def_list = default_port;
+		def_active = &dev->def_port;
+		n_def = ARRAY_SIZE(default_port);
+		break;
+	default:
+		WARN_ON(1);
+		goto out;
+	}
+
+	memset(&cb, 0, sizeof(cb));
+	cb.info = info;
+	cb.fill = swconfig_dump_attr;
+	for (i = 0; i < alist->n_attr; i++) {
+		if (alist->attr[i].disabled)
+			continue;
+		cb.args[0] = i;
+		err = swconfig_send_multipart(&cb, (void *) &alist->attr[i]);
+		if (err < 0)
+			goto error;
+	}
+
+	/* defaults */
+	for (i = 0; i < n_def; i++) {
+		if (!test_bit(i, def_active))
+			continue;
+		cb.args[0] = SWITCH_ATTR_DEFAULTS_OFFSET + i;
+		err = swconfig_send_multipart(&cb, (void *) &def_list[i]);
+		if (err < 0)
+			goto error;
+	}
+	swconfig_put_dev(dev);
+
+	if (!cb.msg)
+		return 0;
+
+	return genlmsg_reply(cb.msg, info);
+
+error:
+	if (cb.msg)
+		nlmsg_free(cb.msg);
+out:
+	swconfig_put_dev(dev);
+	return err;
+}
+
+static const struct switch_attr *
+swconfig_lookup_attr(struct switch_dev *dev, struct genl_info *info,
+		struct switch_val *val)
+{
+	struct genlmsghdr *hdr = nlmsg_data(info->nlhdr);
+	const struct switch_attrlist *alist;
+	const struct switch_attr *attr = NULL;
+	int attr_id;
+
+	/* defaults */
+	struct switch_attr *def_list;
+	unsigned long *def_active;
+	int n_def;
+
+	if (!info->attrs[SWITCH_ATTR_OP_ID])
+		goto done;
+
+	switch (hdr->cmd) {
+	case SWITCH_CMD_SET_GLOBAL:
+	case SWITCH_CMD_GET_GLOBAL:
+		alist = &dev->ops->attr_global;
+		def_list = default_global;
+		def_active = &dev->def_global;
+		n_def = ARRAY_SIZE(default_global);
+		break;
+	case SWITCH_CMD_SET_VLAN:
+	case SWITCH_CMD_GET_VLAN:
+		alist = &dev->ops->attr_vlan;
+		def_list = default_vlan;
+		def_active = &dev->def_vlan;
+		n_def = ARRAY_SIZE(default_vlan);
+		if (!info->attrs[SWITCH_ATTR_OP_VLAN])
+			goto done;
+		val->port_vlan = nla_get_u32(info->attrs[SWITCH_ATTR_OP_VLAN]);
+		if (val->port_vlan >= dev->vlans)
+			goto done;
+		break;
+	case SWITCH_CMD_SET_PORT:
+	case SWITCH_CMD_GET_PORT:
+		alist = &dev->ops->attr_port;
+		def_list = default_port;
+		def_active = &dev->def_port;
+		n_def = ARRAY_SIZE(default_port);
+		if (!info->attrs[SWITCH_ATTR_OP_PORT])
+			goto done;
+		val->port_vlan = nla_get_u32(info->attrs[SWITCH_ATTR_OP_PORT]);
+		if (val->port_vlan >= dev->ports)
+			goto done;
+		break;
+	default:
+		WARN_ON(1);
+		goto done;
+	}
+
+	if (!alist)
+		goto done;
+
+	attr_id = nla_get_u32(info->attrs[SWITCH_ATTR_OP_ID]);
+	if (attr_id >= SWITCH_ATTR_DEFAULTS_OFFSET) {
+		attr_id -= SWITCH_ATTR_DEFAULTS_OFFSET;
+		if (attr_id >= n_def)
+			goto done;
+		if (!test_bit(attr_id, def_active))
+			goto done;
+		attr = &def_list[attr_id];
+	} else {
+		if (attr_id >= alist->n_attr)
+			goto done;
+		attr = &alist->attr[attr_id];
+	}
+
+	if (attr->disabled)
+		attr = NULL;
+
+done:
+	if (!attr)
+		pr_debug("attribute lookup failed\n");
+	val->attr = attr;
+	return attr;
+}
+
+static int
+swconfig_parse_ports(struct sk_buff *msg, struct nlattr *head,
+		struct switch_val *val, int max)
+{
+	struct nlattr *nla;
+	int rem;
+
+	val->len = 0;
+	nla_for_each_nested(nla, head, rem) {
+		struct nlattr *tb[SWITCH_PORT_ATTR_MAX+1];
+		struct switch_port *port = &val->value.ports[val->len];
+
+		if (val->len >= max)
+			return -EINVAL;
+
+		if (nla_parse_nested(tb, SWITCH_PORT_ATTR_MAX, nla,
+				port_policy))
+			return -EINVAL;
+
+		if (!tb[SWITCH_PORT_ID])
+			return -EINVAL;
+
+		port->id = nla_get_u32(tb[SWITCH_PORT_ID]);
+		if (tb[SWITCH_PORT_FLAG_TAGGED])
+			port->flags |= (1 << SWITCH_PORT_FLAG_TAGGED);
+		val->len++;
+	}
+
+	return 0;
+}
+
+static int
+swconfig_set_attr(struct sk_buff *skb, struct genl_info *info)
+{
+	const struct switch_attr *attr;
+	struct switch_dev *dev;
+	struct switch_val val;
+	int err = -EINVAL;
+
+	dev = swconfig_get_dev(info);
+	if (!dev)
+		return -EINVAL;
+
+	memset(&val, 0, sizeof(val));
+	attr = swconfig_lookup_attr(dev, info, &val);
+	if (!attr || !attr->set)
+		goto error;
+
+	val.attr = attr;
+	switch (attr->type) {
+	case SWITCH_TYPE_NOVAL:
+		break;
+	case SWITCH_TYPE_INT:
+		if (!info->attrs[SWITCH_ATTR_OP_VALUE_INT])
+			goto error;
+		val.value.i =
+			nla_get_u32(info->attrs[SWITCH_ATTR_OP_VALUE_INT]);
+		break;
+	case SWITCH_TYPE_STRING:
+		if (!info->attrs[SWITCH_ATTR_OP_VALUE_STR])
+			goto error;
+		val.value.s =
+			nla_data(info->attrs[SWITCH_ATTR_OP_VALUE_STR]);
+		break;
+	case SWITCH_TYPE_PORTS:
+		val.value.ports = dev->portbuf;
+		memset(dev->portbuf, 0,
+			sizeof(struct switch_port) * dev->ports);
+
+		/* TODO: implement multipart? */
+		if (info->attrs[SWITCH_ATTR_OP_VALUE_PORTS]) {
+			err = swconfig_parse_ports(skb,
+				info->attrs[SWITCH_ATTR_OP_VALUE_PORTS],
+				&val, dev->ports);
+			if (err < 0)
+				goto error;
+		} else {
+			val.len = 0;
+			err = 0;
+		}
+		break;
+	default:
+		goto error;
+	}
+
+	err = attr->set(dev, attr, &val);
+error:
+	swconfig_put_dev(dev);
+	return err;
+}
+
+static int
+swconfig_close_portlist(struct swconfig_callback *cb, void *arg)
+{
+	if (cb->nest[0])
+		nla_nest_end(cb->msg, cb->nest[0]);
+	return 0;
+}
+
+static int
+swconfig_send_port(struct swconfig_callback *cb, void *arg)
+{
+	const struct switch_port *port = arg;
+	struct nlattr *p = NULL;
+
+	if (!cb->nest[0]) {
+		cb->nest[0] = nla_nest_start(cb->msg, cb->cmd);
+		if (!cb->nest[0])
+			return -1;
+	}
+
+	p = nla_nest_start(cb->msg, SWITCH_ATTR_PORT);
+	if (!p)
+		goto error;
+
+	if (nla_put_u32(cb->msg, SWITCH_PORT_ID, port->id))
+		goto nla_put_failure;
+	if (port->flags & (1 << SWITCH_PORT_FLAG_TAGGED)) {
+		if (nla_put_flag(cb->msg, SWITCH_PORT_FLAG_TAGGED))
+			goto nla_put_failure;
+	}
+
+	nla_nest_end(cb->msg, p);
+	return 0;
+
+nla_put_failure:
+		nla_nest_cancel(cb->msg, p);
+error:
+	nla_nest_cancel(cb->msg, cb->nest[0]);
+	return -1;
+}
+
+static int
+swconfig_send_ports(struct sk_buff **msg, struct genl_info *info, int attr,
+		const struct switch_val *val)
+{
+	struct swconfig_callback cb;
+	int err = 0;
+	int i;
+
+	if (!val->value.ports)
+		return -EINVAL;
+
+	memset(&cb, 0, sizeof(cb));
+	cb.cmd = attr;
+	cb.msg = *msg;
+	cb.info = info;
+	cb.fill = swconfig_send_port;
+	cb.close = swconfig_close_portlist;
+
+	cb.nest[0] = nla_nest_start(cb.msg, cb.cmd);
+	for (i = 0; i < val->len; i++) {
+		err = swconfig_send_multipart(&cb, &val->value.ports[i]);
+		if (err)
+			goto done;
+	}
+	err = val->len;
+	swconfig_close_portlist(&cb, NULL);
+	*msg = cb.msg;
+
+done:
+	return err;
+}
+
+static int
+swconfig_get_attr(struct sk_buff *skb, struct genl_info *info)
+{
+	struct genlmsghdr *hdr = nlmsg_data(info->nlhdr);
+	const struct switch_attr *attr;
+	struct switch_dev *dev;
+	struct sk_buff *msg = NULL;
+	struct switch_val val;
+	int err = -EINVAL;
+	int cmd = hdr->cmd;
+
+	dev = swconfig_get_dev(info);
+	if (!dev)
+		return -EINVAL;
+
+	memset(&val, 0, sizeof(val));
+	attr = swconfig_lookup_attr(dev, info, &val);
+	if (!attr || !attr->get)
+		goto error;
+
+	if (attr->type == SWITCH_TYPE_PORTS) {
+		val.value.ports = dev->portbuf;
+		memset(dev->portbuf, 0,
+			sizeof(struct switch_port) * dev->ports);
+	}
+
+	err = attr->get(dev, attr, &val);
+	if (err)
+		goto error;
+
+	msg = nlmsg_new(NLMSG_GOODSIZE, GFP_KERNEL);
+	if (!msg)
+		goto error;
+
+	hdr = genlmsg_put(msg, info->snd_portid, info->snd_seq, &switch_fam,
+			0, cmd);
+	if (IS_ERR(hdr))
+		goto nla_put_failure;
+
+	switch (attr->type) {
+	case SWITCH_TYPE_INT:
+		if (nla_put_u32(msg, SWITCH_ATTR_OP_VALUE_INT, val.value.i))
+			goto nla_put_failure;
+		break;
+	case SWITCH_TYPE_STRING:
+		if (nla_put_string(msg, SWITCH_ATTR_OP_VALUE_STR, val.value.s))
+			goto nla_put_failure;
+		break;
+	case SWITCH_TYPE_PORTS:
+		err = swconfig_send_ports(&msg, info,
+				SWITCH_ATTR_OP_VALUE_PORTS, &val);
+		if (err < 0)
+			goto nla_put_failure;
+		break;
+	default:
+		pr_debug("invalid type in attribute\n");
+		err = -EINVAL;
+		goto error;
+	}
+	err = genlmsg_end(msg, hdr);
+	if (err < 0)
+		goto nla_put_failure;
+
+	swconfig_put_dev(dev);
+	return genlmsg_reply(msg, info);
+
+nla_put_failure:
+	if (msg)
+		nlmsg_free(msg);
+error:
+	swconfig_put_dev(dev);
+	if (!err)
+		err = -ENOMEM;
+	return err;
+}
+
+static int
+swconfig_send_switch(struct sk_buff *msg, u32 pid, u32 seq, int flags,
+		const struct switch_dev *dev)
+{
+	struct nlattr *p = NULL, *m = NULL;
+	void *hdr;
+	int i;
+
+	hdr = genlmsg_put(msg, pid, seq, &switch_fam, flags,
+			SWITCH_CMD_NEW_ATTR);
+	if (IS_ERR(hdr))
+		return -1;
+
+	if (nla_put_u32(msg, SWITCH_ATTR_ID, dev->id))
+		goto nla_put_failure;
+	if (nla_put_string(msg, SWITCH_ATTR_DEV_NAME, dev->devname))
+		goto nla_put_failure;
+	if (nla_put_string(msg, SWITCH_ATTR_ALIAS, dev->alias))
+		goto nla_put_failure;
+	if (nla_put_string(msg, SWITCH_ATTR_NAME, dev->name))
+		goto nla_put_failure;
+	if (nla_put_u32(msg, SWITCH_ATTR_VLANS, dev->vlans))
+		goto nla_put_failure;
+	if (nla_put_u32(msg, SWITCH_ATTR_PORTS, dev->ports))
+		goto nla_put_failure;
+	if (nla_put_u32(msg, SWITCH_ATTR_CPU_PORT, dev->cpu_port))
+		goto nla_put_failure;
+
+	m = nla_nest_start(msg, SWITCH_ATTR_PORTMAP);
+	if (!m)
+		goto nla_put_failure;
+	for (i = 0; i < dev->ports; i++) {
+		p = nla_nest_start(msg, SWITCH_ATTR_PORTS);
+		if (!p)
+			continue;
+		if (dev->portmap[i].s) {
+			if (nla_put_string(msg, SWITCH_PORTMAP_SEGMENT,
+						dev->portmap[i].s))
+				goto nla_put_failure;
+			if (nla_put_u32(msg, SWITCH_PORTMAP_VIRT,
+						dev->portmap[i].virt))
+				goto nla_put_failure;
+		}
+		nla_nest_end(msg, p);
+	}
+	nla_nest_end(msg, m);
+	return genlmsg_end(msg, hdr);
+nla_put_failure:
+	genlmsg_cancel(msg, hdr);
+	return -EMSGSIZE;
+}
+
+static int swconfig_dump_switches(struct sk_buff *skb,
+		struct netlink_callback *cb)
+{
+	struct switch_dev *dev;
+	int start = cb->args[0];
+	int idx = 0;
+
+	swconfig_lock();
+	list_for_each_entry(dev, &swdevs, dev_list) {
+		if (++idx <= start)
+			continue;
+		if (swconfig_send_switch(skb, NETLINK_CB(cb->skb).portid,
+				cb->nlh->nlmsg_seq, NLM_F_MULTI,
+				dev) < 0)
+			break;
+	}
+	swconfig_unlock();
+	cb->args[0] = idx;
+
+	return skb->len;
+}
+
+static int
+swconfig_done(struct netlink_callback *cb)
+{
+	return 0;
+}
+
+static struct genl_ops swconfig_ops[] = {
+	{
+		.cmd = SWITCH_CMD_LIST_GLOBAL,
+		.doit = swconfig_list_attrs,
+		.policy = switch_policy,
+	},
+	{
+		.cmd = SWITCH_CMD_LIST_VLAN,
+		.doit = swconfig_list_attrs,
+		.policy = switch_policy,
+	},
+	{
+		.cmd = SWITCH_CMD_LIST_PORT,
+		.doit = swconfig_list_attrs,
+		.policy = switch_policy,
+	},
+	{
+		.cmd = SWITCH_CMD_GET_GLOBAL,
+		.doit = swconfig_get_attr,
+		.policy = switch_policy,
+	},
+	{
+		.cmd = SWITCH_CMD_GET_VLAN,
+		.doit = swconfig_get_attr,
+		.policy = switch_policy,
+	},
+	{
+		.cmd = SWITCH_CMD_GET_PORT,
+		.doit = swconfig_get_attr,
+		.policy = switch_policy,
+	},
+	{
+		.cmd = SWITCH_CMD_SET_GLOBAL,
+		.doit = swconfig_set_attr,
+		.policy = switch_policy,
+	},
+	{
+		.cmd = SWITCH_CMD_SET_VLAN,
+		.doit = swconfig_set_attr,
+		.policy = switch_policy,
+	},
+	{
+		.cmd = SWITCH_CMD_SET_PORT,
+		.doit = swconfig_set_attr,
+		.policy = switch_policy,
+	},
+	{
+		.cmd = SWITCH_CMD_GET_SWITCH,
+		.dumpit = swconfig_dump_switches,
+		.policy = switch_policy,
+		.done = swconfig_done,
+	}
+};
+
+int
+register_switch(struct switch_dev *dev, struct net_device *netdev)
+{
+	struct switch_dev *sdev;
+	const int max_switches = 8 * sizeof(unsigned long);
+	unsigned long in_use = 0;
+	int i;
+
+	INIT_LIST_HEAD(&dev->dev_list);
+	if (netdev) {
+		dev->netdev = netdev;
+		if (!dev->alias)
+			dev->alias = netdev->name;
+	}
+	BUG_ON(!dev->alias);
+
+	if (dev->ports > 0) {
+		dev->portbuf = kzalloc(sizeof(struct switch_port) *
+				dev->ports, GFP_KERNEL);
+		if (!dev->portbuf)
+			return -ENOMEM;
+		dev->portmap = kzalloc(sizeof(struct switch_portmap) *
+				dev->ports, GFP_KERNEL);
+		if (!dev->portmap) {
+			kfree(dev->portbuf);
+			return -ENOMEM;
+		}
+	}
+	swconfig_defaults_init(dev);
+	mutex_init(&dev->sw_mutex);
+	swconfig_lock();
+	dev->id = ++swdev_id;
+
+	list_for_each_entry(sdev, &swdevs, dev_list) {
+		if (!sscanf(sdev->devname, SWCONFIG_DEVNAME, &i))
+			continue;
+		if (i < 0 || i > max_switches)
+			continue;
+
+		set_bit(i, &in_use);
+	}
+	i = find_first_zero_bit(&in_use, max_switches);
+
+	if (i == max_switches) {
+		swconfig_unlock();
+		return -ENFILE;
+	}
+
+	/* fill device name */
+	snprintf(dev->devname, IFNAMSIZ, SWCONFIG_DEVNAME, i);
+
+	list_add(&dev->dev_list, &swdevs);
+	swconfig_unlock();
+
+	return 0;
+}
+EXPORT_SYMBOL_GPL(register_switch);
+
+void
+unregister_switch(struct switch_dev *dev)
+{
+	kfree(dev->portbuf);
+	mutex_lock(&dev->sw_mutex);
+	swconfig_lock();
+	list_del(&dev->dev_list);
+	swconfig_unlock();
+	mutex_unlock(&dev->sw_mutex);
+}
+EXPORT_SYMBOL_GPL(unregister_switch);
+
+
+static int __init
+swconfig_init(void)
+{
+	int i, err;
+
+	INIT_LIST_HEAD(&swdevs);
+	err = genl_register_family(&switch_fam);
+	if (err)
+		return err;
+
+	for (i = 0; i < ARRAY_SIZE(swconfig_ops); i++) {
+		err = genl_register_ops(&switch_fam, &swconfig_ops[i]);
+		if (err)
+			goto unregister;
+	}
+
+	return 0;
+
+unregister:
+	genl_unregister_family(&switch_fam);
+	return err;
+}
+
+static void __exit
+swconfig_exit(void)
+{
+	genl_unregister_family(&switch_fam);
+}
+
+module_init(swconfig_init);
+module_exit(swconfig_exit);
+
diff --git a/include/linux/swconfig.h b/include/linux/swconfig.h
new file mode 100644
index 0000000..fd96eec
--- /dev/null
+++ b/include/linux/swconfig.h
@@ -0,0 +1,180 @@
+/*
+ * Switch configuration API
+ *
+ * Copyright (C) 2008 Felix Fietkau <nbd@openwrt.org>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ */
+#ifndef _LINUX_SWITCH_H
+#define _LINUX_SWITCH_H
+
+#include <net/genetlink.h>
+#include <uapi/linux/swconfig.h>
+
+struct switch_dev;
+struct switch_op;
+struct switch_val;
+struct switch_attr;
+struct switch_attrlist;
+struct switch_led_trigger;
+
+int register_switch(struct switch_dev *dev, struct net_device *netdev);
+void unregister_switch(struct switch_dev *dev);
+
+/**
+ * struct switch_attrlist - attribute list
+ *
+ * @n_attr: number of attributes
+ * @attr: pointer to the attributes array
+ */
+struct switch_attrlist {
+	int n_attr;
+	const struct switch_attr *attr;
+};
+
+enum switch_port_speed {
+	SWITCH_PORT_SPEED_UNKNOWN = 0,
+	SWITCH_PORT_SPEED_10 = 10,
+	SWITCH_PORT_SPEED_100 = 100,
+	SWITCH_PORT_SPEED_1000 = 1000,
+};
+
+struct switch_port_link {
+	bool link;
+	bool duplex;
+	bool aneg;
+	bool tx_flow;
+	bool rx_flow;
+	enum switch_port_speed speed;
+};
+
+struct switch_port_stats {
+	unsigned long tx_bytes;
+	unsigned long rx_bytes;
+};
+
+/**
+ * struct switch_dev_ops - switch driver operations
+ *
+ * @attr_global: global switch attribute list
+ * @attr_port: port attribute list
+ * @attr_vlan: vlan attribute list
+ *
+ * Callbacks:
+ *
+ * @get_vlan_ports: read the port list of a VLAN
+ * @set_vlan_ports: set the port list of a VLAN
+ *
+ * @get_port_pvid: get the primary VLAN ID of a port
+ * @set_port_pvid: set the primary VLAN ID of a port
+ *
+ * @apply_config: apply all changed settings to the switch
+ * @reset_switch: resetting the switch
+ *
+ * @get_port_link: read the port link status
+ * @get_port_stats: read the port statistics counters
+ */
+struct switch_dev_ops {
+	struct switch_attrlist attr_global, attr_port, attr_vlan;
+
+	int (*get_vlan_ports)(struct switch_dev *dev, struct switch_val *val);
+	int (*set_vlan_ports)(struct switch_dev *dev, struct switch_val *val);
+
+	int (*get_port_pvid)(struct switch_dev *dev, int port, int *val);
+	int (*set_port_pvid)(struct switch_dev *dev, int port, int val);
+
+	int (*apply_config)(struct switch_dev *dev);
+	int (*reset_switch)(struct switch_dev *dev);
+
+	int (*get_port_link)(struct switch_dev *dev, int port,
+			     struct switch_port_link *link);
+	int (*get_port_stats)(struct switch_dev *dev, int port,
+			      struct switch_port_stats *stats);
+};
+
+/**
+ * struct switch_dev - switch device
+ *
+ * @ops: switch driver operations pointer
+ * @devname: switch device name (automatically filled)
+ * @name: switch driver name returned to user-space
+ * @alias: alias name for the switch (instead of ethX) returned to user-space
+ * @netdev: network device pointer if alias is not used
+ *
+ * @ports: number of physical switch ports
+ * @vlans: number of supported VLANs
+ * @cpu_port: identifier for the CPU port
+ */
+struct switch_dev {
+	const struct switch_dev_ops *ops;
+	/* will be automatically filled */
+	char devname[IFNAMSIZ];
+
+	const char *name;
+	/* NB: either alias or netdev must be set */
+	const char *alias;
+	struct net_device *netdev;
+
+	int ports;
+	int vlans;
+	int cpu_port;
+
+	/* the following fields are internal for swconfig */
+	int id;
+	struct list_head dev_list;
+	unsigned long def_global, def_port, def_vlan;
+
+	struct mutex sw_mutex;
+	struct switch_port *portbuf;
+	struct switch_portmap *portmap;
+
+	char buf[128];
+};
+
+struct switch_port {
+	u32 id;
+	u32 flags;
+};
+
+struct switch_portmap {
+	u32 virt;
+	const char *s;
+};
+
+struct switch_val {
+	const struct switch_attr *attr;
+	int port_vlan;
+	int len;
+	union {
+		const char *s;
+		u32 i;
+		struct switch_port *ports;
+	} value;
+};
+
+struct switch_attr {
+	int disabled;
+	int type;
+	const char *name;
+	const char *description;
+
+	int (*set)(struct switch_dev *dev, const struct switch_attr *attr,
+			struct switch_val *val);
+	int (*get)(struct switch_dev *dev, const struct switch_attr *attr,
+			struct switch_val *val);
+
+	/* for driver internal use */
+	int id;
+	int ofs;
+	int max;
+};
+
+#endif /* _LINUX_SWITCH_H */
diff --git a/include/uapi/linux/Kbuild b/include/uapi/linux/Kbuild
index 115add2..0a995be 100644
--- a/include/uapi/linux/Kbuild
+++ b/include/uapi/linux/Kbuild
@@ -363,6 +363,7 @@ header-y += stddef.h
 header-y += string.h
 header-y += suspend_ioctls.h
 header-y += swab.h
+header-y += swconfig.h
 header-y += synclink.h
 header-y += sysctl.h
 header-y += sysinfo.h
diff --git a/include/uapi/linux/swconfig.h b/include/uapi/linux/swconfig.h
new file mode 100644
index 0000000..17cf178
--- /dev/null
+++ b/include/uapi/linux/swconfig.h
@@ -0,0 +1,103 @@
+/*
+ * Switch configuration API
+ *
+ * Copyright (C) 2008 Felix Fietkau <nbd@openwrt.org>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ */
+
+#ifndef _UAPI_LINUX_SWITCH_H
+#define _UAPI_LINUX_SWITCH_H
+
+#include <linux/types.h>
+#include <linux/netdevice.h>
+#include <linux/netlink.h>
+#include <linux/genetlink.h>
+#ifndef __KERNEL__
+#include <netlink/netlink.h>
+#include <netlink/genl/genl.h>
+#include <netlink/genl/ctrl.h>
+#endif
+
+/* main attributes */
+enum {
+	SWITCH_ATTR_UNSPEC,
+	/* global */
+	SWITCH_ATTR_TYPE,
+	/* device */
+	SWITCH_ATTR_ID,
+	SWITCH_ATTR_DEV_NAME,
+	SWITCH_ATTR_ALIAS,
+	SWITCH_ATTR_NAME,
+	SWITCH_ATTR_VLANS,
+	SWITCH_ATTR_PORTS,
+	SWITCH_ATTR_PORTMAP,
+	SWITCH_ATTR_CPU_PORT,
+	/* attributes */
+	SWITCH_ATTR_OP_ID,
+	SWITCH_ATTR_OP_TYPE,
+	SWITCH_ATTR_OP_NAME,
+	SWITCH_ATTR_OP_PORT,
+	SWITCH_ATTR_OP_VLAN,
+	SWITCH_ATTR_OP_VALUE_INT,
+	SWITCH_ATTR_OP_VALUE_STR,
+	SWITCH_ATTR_OP_VALUE_PORTS,
+	SWITCH_ATTR_OP_DESCRIPTION,
+	/* port lists */
+	SWITCH_ATTR_PORT,
+	SWITCH_ATTR_MAX
+};
+
+enum {
+	/* port map */
+	SWITCH_PORTMAP_PORTS,
+	SWITCH_PORTMAP_SEGMENT,
+	SWITCH_PORTMAP_VIRT,
+	SWITCH_PORTMAP_MAX
+};
+
+/* commands */
+enum {
+	SWITCH_CMD_UNSPEC,
+	SWITCH_CMD_GET_SWITCH,
+	SWITCH_CMD_NEW_ATTR,
+	SWITCH_CMD_LIST_GLOBAL,
+	SWITCH_CMD_GET_GLOBAL,
+	SWITCH_CMD_SET_GLOBAL,
+	SWITCH_CMD_LIST_PORT,
+	SWITCH_CMD_GET_PORT,
+	SWITCH_CMD_SET_PORT,
+	SWITCH_CMD_LIST_VLAN,
+	SWITCH_CMD_GET_VLAN,
+	SWITCH_CMD_SET_VLAN
+};
+
+/* data types */
+enum switch_val_type {
+	SWITCH_TYPE_UNSPEC,
+	SWITCH_TYPE_INT,
+	SWITCH_TYPE_STRING,
+	SWITCH_TYPE_PORTS,
+	SWITCH_TYPE_NOVAL,
+};
+
+/* port nested attributes */
+enum {
+	SWITCH_PORT_UNSPEC,
+	SWITCH_PORT_ID,
+	SWITCH_PORT_FLAG_TAGGED,
+	SWITCH_PORT_ATTR_MAX
+};
+
+#define SWITCH_ATTR_DEFAULTS_OFFSET	0x1000
+
+
+#endif /* _UAPI_LINUX_SWITCH_H */
-- 
1.8.3.2

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH 2/4 net-next] tools: add Generic Netlink switch configuration tool
  2013-10-22 18:23 [PATCH 0/4 net-next] net: phy: add Generic Netlink switch configuration API Florian Fainelli
  2013-10-22 18:23 ` [PATCH 1/4 net-next] net: phy: add Generic Netlink Ethernet " Florian Fainelli
@ 2013-10-22 18:23 ` Florian Fainelli
  2013-10-22 18:23 ` [PATCH 3/4 net-next] net: phy: add Broadcom B53 switch driver Florian Fainelli
  2013-10-22 18:23 ` [PATCH 4/4 net-next] net: phy: add fake " Florian Fainelli
  3 siblings, 0 replies; 41+ messages in thread
From: Florian Fainelli @ 2013-10-22 18:23 UTC (permalink / raw)
  To: netdev; +Cc: davem, s.hauer, nbd, blogic, jogo, gary, Florian Fainelli

Add the user-space configuration tool for the Generic Netlink switch
configuration API. This tool is currently linking with libnl-1 and uses
pkg-config to discover the netlink library to use.

Signed-off-by: Felix Fietkau <nbd@openwrt.org>
Signed-off-by: John Crispin <blogic@openwrt.org>
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
---
 MAINTAINERS               |   1 +
 tools/Makefile            |  10 +-
 tools/swconfig/.gitignore |   2 +
 tools/swconfig/Makefile   |  15 +
 tools/swconfig/cli.c      | 328 +++++++++++++++++++
 tools/swconfig/swlib.c    | 786 ++++++++++++++++++++++++++++++++++++++++++++++
 tools/swconfig/swlib.h    | 244 ++++++++++++++
 7 files changed, 1385 insertions(+), 1 deletion(-)
 create mode 100644 tools/swconfig/.gitignore
 create mode 100644 tools/swconfig/Makefile
 create mode 100644 tools/swconfig/cli.c
 create mode 100644 tools/swconfig/swlib.c
 create mode 100644 tools/swconfig/swlib.h

diff --git a/MAINTAINERS b/MAINTAINERS
index 3a54262..bdd5b0f 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -8126,6 +8126,7 @@ F:	drivers/net/ethernet/phy/swconfig*.c
 F:	include/uapi/linux/switch.h
 F:	include/linux/switch.h
 F:	Documentation/networking/swconfig.txt
+F:	tools/swconfig/
 
 SYNOPSYS ARC ARCHITECTURE
 M:	Vineet Gupta <vgupta@synopsys.com>
diff --git a/tools/Makefile b/tools/Makefile
index 41067f3..2dbb2695 100644
--- a/tools/Makefile
+++ b/tools/Makefile
@@ -15,6 +15,7 @@ help:
 	@echo '  net        - misc networking tools'
 	@echo '  vm         - misc vm tools'
 	@echo '  x86_energy_perf_policy - Intel energy policy tool'
+	@echo '  swconfig   - netlink switch configuration tool'
 	@echo ''
 	@echo 'You can do:'
 	@echo ' $$ make -C tools/ <tool>_install'
@@ -50,6 +51,9 @@ selftests: FORCE
 turbostat x86_energy_perf_policy: FORCE
 	$(call descend,power/x86/$@)
 
+swconfig: FORCE
+	$(call descend,swconfig)
+
 cpupower_install:
 	$(call descend,power/$(@:_install=),install)
 
@@ -84,8 +88,12 @@ selftests_clean:
 turbostat_clean x86_energy_perf_policy_clean:
 	$(call descend,power/x86/$(@:_clean=),clean)
 
+swconfig_clean:
+	$(call descend,swconfig,clean)
+
 clean: cgroup_clean cpupower_clean firewire_clean lguest_clean perf_clean \
 		selftests_clean turbostat_clean usb_clean virtio_clean \
-		vm_clean net_clean x86_energy_perf_policy_clean
+		vm_clean net_clean x86_energy_perf_policy_clean \
+		swconfig_clean
 
 .PHONY: FORCE
diff --git a/tools/swconfig/.gitignore b/tools/swconfig/.gitignore
new file mode 100644
index 0000000..b23b79b
--- /dev/null
+++ b/tools/swconfig/.gitignore
@@ -0,0 +1,2 @@
+/swconfig
+/*.o
diff --git a/tools/swconfig/Makefile b/tools/swconfig/Makefile
new file mode 100644
index 0000000..96158678
--- /dev/null
+++ b/tools/swconfig/Makefile
@@ -0,0 +1,15 @@
+ifndef CFLAGS
+CFLAGS = -O2 -g -D_GNU_SOURCE $(shell pkg-config --cflags libnl-1)
+endif
+LIBS=$(shell pkg-config --libs libnl-1)
+
+all: swconfig
+
+%.o: %.c
+	$(CC) $(CFLAGS) -c -o $@ $^
+
+swconfig: cli.o swlib.o
+	$(CC) $(LDFLAGS) -o $@ $^ $(LIBS)
+
+clean:
+	-rm *.o swconfig
diff --git a/tools/swconfig/cli.c b/tools/swconfig/cli.c
new file mode 100644
index 0000000..9752c95
--- /dev/null
+++ b/tools/swconfig/cli.c
@@ -0,0 +1,328 @@
+/*
+ * swconfig.c: Switch configuration utility
+ *
+ * Copyright (C) 2008 Felix Fietkau <nbd@openwrt.org>
+ * Copyright (C) 2010 Martin Mares <mj@ucw.cz>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * version 2 as published by the Free Software Foundatio.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ */
+
+#include <stdio.h>
+#include <string.h>
+#include <stdlib.h>
+#include <stdbool.h>
+#include <inttypes.h>
+#include <errno.h>
+#include <stdint.h>
+#include <getopt.h>
+#include <sys/types.h>
+#include <sys/socket.h>
+
+#include <linux/types.h>
+#include <linux/netlink.h>
+#include <linux/genetlink.h>
+#include <netlink/netlink.h>
+#include <netlink/genl/genl.h>
+#include <netlink/genl/ctrl.h>
+#include "../../include/uapi/linux/swconfig.h"
+#include "swlib.h"
+
+enum {
+	CMD_NONE,
+	CMD_GET,
+	CMD_SET,
+	CMD_LOAD,
+	CMD_HELP,
+	CMD_SHOW,
+	CMD_PORTMAP,
+};
+
+static void
+print_attrs(const struct switch_attr *attr)
+{
+	int i = 0;
+	while (attr) {
+		const char *type;
+		switch(attr->type) {
+			case SWITCH_TYPE_INT:
+				type = "int";
+				break;
+			case SWITCH_TYPE_STRING:
+				type = "string";
+				break;
+			case SWITCH_TYPE_PORTS:
+				type = "ports";
+				break;
+			case SWITCH_TYPE_NOVAL:
+				type = "none";
+				break;
+			default:
+				type = "unknown";
+				break;
+		}
+		printf("\tAttribute %d (%s): %s (%s)\n", ++i, type, attr->name, attr->description);
+		attr = attr->next;
+	}
+}
+
+static void
+list_attributes(struct switch_dev *dev)
+{
+	printf("%s: %s(%s), ports: %d (cpu @ %d), vlans: %d\n", dev->dev_name, dev->alias, dev->name, dev->ports, dev->cpu_port, dev->vlans);
+	printf("     --switch\n");
+	print_attrs(dev->ops);
+	printf("     --vlan\n");
+	print_attrs(dev->vlan_ops);
+	printf("     --port\n");
+	print_attrs(dev->port_ops);
+}
+
+static void
+print_attr_val(const struct switch_attr *attr, const struct switch_val *val)
+{
+	int i;
+
+	switch (attr->type) {
+	case SWITCH_TYPE_INT:
+		printf("%d", val->value.i);
+		break;
+	case SWITCH_TYPE_STRING:
+		printf("%s", val->value.s);
+		break;
+	case SWITCH_TYPE_PORTS:
+		for(i = 0; i < val->len; i++) {
+			printf("%d%s ",
+				val->value.ports[i].id,
+				(val->value.ports[i].flags &
+				 SWLIB_PORT_FLAG_TAGGED) ? "t" : "");
+		}
+		break;
+	default:
+		printf("?unknown-type?");
+	}
+}
+
+static void
+show_attrs(struct switch_dev *dev, struct switch_attr *attr, struct switch_val *val)
+{
+	while (attr) {
+		if (attr->type != SWITCH_TYPE_NOVAL) {
+			printf("\t%s: ", attr->name);
+			if (swlib_get_attr(dev, attr, val) < 0)
+				printf("???");
+			else
+				print_attr_val(attr, val);
+			putchar('\n');
+		}
+		attr = attr->next;
+	}
+}
+
+static void
+show_global(struct switch_dev *dev)
+{
+	struct switch_val val;
+
+	printf("Global attributes:\n");
+	show_attrs(dev, dev->ops, &val);
+}
+
+static void
+show_port(struct switch_dev *dev, int port)
+{
+	struct switch_val val;
+
+	printf("Port %d:\n", port);
+	val.port_vlan = port;
+	show_attrs(dev, dev->port_ops, &val);
+}
+
+static void
+show_vlan(struct switch_dev *dev, int vlan, bool all)
+{
+	struct switch_val val;
+	struct switch_attr *attr;
+
+	val.port_vlan = vlan;
+
+	if (all) {
+		attr = swlib_lookup_attr(dev, SWLIB_ATTR_GROUP_VLAN, "ports");
+		if (swlib_get_attr(dev, attr, &val) < 0)
+			return;
+
+		if (!val.len)
+			return;
+	}
+
+	printf("VLAN %d:\n", vlan);
+	show_attrs(dev, dev->vlan_ops, &val);
+}
+
+static void
+print_usage(void)
+{
+	printf("swconfig list\n");
+	printf("swconfig dev <dev> [port <port>|vlan <vlan>] (help|set <key> <value>|get <key>|load <config>|show)\n");
+	exit(1);
+}
+
+int main(int argc, char **argv)
+{
+	int retval = 0;
+	struct switch_dev *dev;
+	struct switch_attr *a;
+	struct switch_val val;
+	int err;
+	int i;
+
+	int cmd = CMD_NONE;
+	char *cdev = NULL;
+	int cport = -1;
+	int cvlan = -1;
+	char *ckey = NULL;
+	char *cvalue = NULL;
+	char *csegment = NULL;
+
+	if((argc == 2) && !strcmp(argv[1], "list")) {
+		swlib_list();
+		return 0;
+	}
+
+	if(argc < 4)
+		print_usage();
+
+	if(strcmp(argv[1], "dev"))
+		print_usage();
+
+	cdev = argv[2];
+
+	for(i = 3; i < argc; i++)
+	{
+		char *arg = argv[i];
+		if (cmd != CMD_NONE) {
+			print_usage();
+		} else if (!strcmp(arg, "port") && i+1 < argc) {
+			cport = atoi(argv[++i]);
+		} else if (!strcmp(arg, "vlan") && i+1 < argc) {
+			cvlan = atoi(argv[++i]);
+		} else if (!strcmp(arg, "help")) {
+			cmd = CMD_HELP;
+		} else if (!strcmp(arg, "set") && i+1 < argc) {
+			cmd = CMD_SET;
+			ckey = argv[++i];
+			if (i+1 < argc)
+				cvalue = argv[++i];
+		} else if (!strcmp(arg, "get") && i+1 < argc) {
+			cmd = CMD_GET;
+			ckey = argv[++i];
+		} else if (!strcmp(arg, "load") && i+1 < argc) {
+			if ((cport >= 0) || (cvlan >= 0))
+				print_usage();
+			cmd = CMD_LOAD;
+			ckey = argv[++i];
+		} else if (!strcmp(arg, "portmap")) {
+			if (i + 1 < argc)
+				csegment = argv[++i];
+			cmd = CMD_PORTMAP;
+		} else if (!strcmp(arg, "show")) {
+			cmd = CMD_SHOW;
+		} else {
+			print_usage();
+		}
+	}
+
+	if (cmd == CMD_NONE)
+		print_usage();
+	if (cport > -1 && cvlan > -1)
+		print_usage();
+
+	dev = swlib_connect(cdev);
+	if (!dev) {
+		fprintf(stderr, "Failed to connect to the switch\n");
+		return 1;
+	}
+
+	swlib_scan(dev);
+
+	if (cmd == CMD_GET || cmd == CMD_SET) {
+		if(cport > -1)
+			a = swlib_lookup_attr(dev, SWLIB_ATTR_GROUP_PORT, ckey);
+		else if(cvlan > -1)
+			a = swlib_lookup_attr(dev, SWLIB_ATTR_GROUP_VLAN, ckey);
+		else
+			a = swlib_lookup_attr(dev, SWLIB_ATTR_GROUP_GLOBAL, ckey);
+
+		if(!a)
+		{
+			fprintf(stderr, "Unknown attribute \"%s\"\n", ckey);
+			goto out;
+		}
+	}
+
+	switch(cmd)
+	{
+	case CMD_SET:
+		if ((a->type != SWITCH_TYPE_NOVAL) &&
+				(cvalue == NULL))
+			print_usage();
+
+		if(cvlan > -1)
+			cport = cvlan;
+
+		if(swlib_set_attr_string(dev, a, cport, cvalue) < 0)
+		{
+			fprintf(stderr, "failed\n");
+			retval = -1;
+			goto out;
+		}
+		break;
+	case CMD_GET:
+		if(cvlan > -1)
+			val.port_vlan = cvlan;
+		if(cport > -1)
+			val.port_vlan = cport;
+		if(swlib_get_attr(dev, a, &val) < 0)
+		{
+			fprintf(stderr, "failed\n");
+			retval = -1;
+			goto out;
+		}
+		print_attr_val(a, &val);
+		putchar('\n');
+		break;
+	case CMD_LOAD:
+		fprintf(stderr, "load command not supported\n");
+		break;
+	case CMD_HELP:
+		list_attributes(dev);
+		break;
+	case CMD_PORTMAP:
+		swlib_print_portmap(dev, csegment);
+		break;
+	case CMD_SHOW:
+		if (cport >= 0 || cvlan >= 0) {
+			if (cport >= 0)
+				show_port(dev, cport);
+			else
+				show_vlan(dev, cvlan, false);
+		} else {
+			show_global(dev);
+			for (i=0; i < dev->ports; i++)
+				show_port(dev, i);
+			for (i=0; i < dev->vlans; i++)
+				show_vlan(dev, i, true);
+		}
+		break;
+	}
+
+out:
+	swlib_free_all(dev);
+	return 0;
+}
diff --git a/tools/swconfig/swlib.c b/tools/swconfig/swlib.c
new file mode 100644
index 0000000..9259f77
--- /dev/null
+++ b/tools/swconfig/swlib.c
@@ -0,0 +1,786 @@
+/*
+ * swlib.c: Switch configuration API (user space part)
+ *
+ * Copyright (C) 2008 Felix Fietkau <nbd@openwrt.org>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public License
+ * version 2.1 as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ */
+
+#include <stdio.h>
+#include <string.h>
+#include <stdlib.h>
+#include <inttypes.h>
+#include <errno.h>
+#include <stdint.h>
+#include <getopt.h>
+#include <sys/types.h>
+#include <sys/socket.h>
+#include "../../include/uapi/linux/swconfig.h"
+#include "swlib.h"
+#include <netlink/netlink.h>
+#include <netlink/genl/genl.h>
+#include <netlink/genl/family.h>
+
+//#define DEBUG 1
+#ifdef DEBUG
+#define DPRINTF(fmt, ...) fprintf(stderr, "%s(%d): " fmt, __func__, __LINE__, ##__VA_ARGS__)
+#else
+#define DPRINTF(fmt, ...) do {} while (0)
+#endif
+
+static struct nl_handle *handle;
+static struct nl_cache *cache;
+static struct genl_family *family;
+static struct nlattr *tb[SWITCH_ATTR_MAX + 1];
+static int refcount = 0;
+
+static struct nla_policy port_policy[SWITCH_ATTR_MAX] = {
+	[SWITCH_PORT_ID] = { .type = NLA_U32 },
+	[SWITCH_PORT_FLAG_TAGGED] = { .type = NLA_FLAG },
+};
+
+static struct nla_policy portmap_policy[SWITCH_PORTMAP_MAX] = {
+	[SWITCH_PORTMAP_SEGMENT] = { .type = NLA_STRING },
+	[SWITCH_PORTMAP_VIRT] = { .type = NLA_U32 },
+};
+
+static inline void *
+swlib_alloc(size_t size)
+{
+	void *ptr;
+
+	ptr = malloc(size);
+	if (!ptr)
+		goto done;
+	memset(ptr, 0, size);
+
+done:
+	return ptr;
+}
+
+static int
+wait_handler(struct nl_msg *msg, void *arg)
+{
+	int *finished = arg;
+
+	*finished = 1;
+	return NL_STOP;
+}
+
+/* helper function for performing netlink requests */
+static int
+swlib_call(int cmd, int (*call)(struct nl_msg *, void *),
+		int (*data)(struct nl_msg *, void *), void *arg)
+{
+	struct nl_msg *msg;
+	struct nl_cb *cb = NULL;
+	int finished;
+	int flags = 0;
+	int err;
+
+	msg = nlmsg_alloc();
+	if (!msg) {
+		fprintf(stderr, "Out of memory!\n");
+		exit(1);
+	}
+
+	if (!data)
+		flags |= NLM_F_DUMP;
+
+	genlmsg_put(msg, NL_AUTO_PID, NL_AUTO_SEQ, genl_family_get_id(family), 0, flags, cmd, 0);
+	if (data) {
+		if (data(msg, arg) < 0)
+			goto nla_put_failure;
+	}
+
+	cb = nl_cb_alloc(NL_CB_CUSTOM);
+	if (!cb) {
+		fprintf(stderr, "nl_cb_alloc failed.\n");
+		exit(1);
+	}
+
+	err = nl_send_auto_complete(handle, msg);
+	if (err < 0) {
+		fprintf(stderr, "nl_send_auto_complete failed: %d\n", err);
+		goto out;
+	}
+
+	finished = 0;
+
+	if (call)
+		nl_cb_set(cb, NL_CB_VALID, NL_CB_CUSTOM, call, arg);
+
+	if (data)
+		nl_cb_set(cb, NL_CB_ACK, NL_CB_CUSTOM, wait_handler, &finished);
+	else
+		nl_cb_set(cb, NL_CB_FINISH, NL_CB_CUSTOM, wait_handler, &finished);
+
+	err = nl_recvmsgs(handle, cb);
+	if (err < 0) {
+		goto out;
+	}
+
+	if (!finished)
+		err = nl_wait_for_ack(handle);
+
+out:
+	if (cb)
+		nl_cb_put(cb);
+nla_put_failure:
+	nlmsg_free(msg);
+	return err;
+}
+
+static int
+send_attr(struct nl_msg *msg, void *arg)
+{
+	struct switch_val *val = arg;
+	struct switch_attr *attr = val->attr;
+
+	NLA_PUT_U32(msg, SWITCH_ATTR_ID, attr->dev->id);
+	NLA_PUT_U32(msg, SWITCH_ATTR_OP_ID, attr->id);
+	switch(attr->atype) {
+	case SWLIB_ATTR_GROUP_PORT:
+		NLA_PUT_U32(msg, SWITCH_ATTR_OP_PORT, val->port_vlan);
+		break;
+	case SWLIB_ATTR_GROUP_VLAN:
+		NLA_PUT_U32(msg, SWITCH_ATTR_OP_VLAN, val->port_vlan);
+		break;
+	default:
+		break;
+	}
+
+	return 0;
+
+nla_put_failure:
+	return -1;
+}
+
+static int
+store_port_val(struct nl_msg *msg, struct nlattr *nla, struct switch_val *val)
+{
+	struct nlattr *p;
+	int ports = val->attr->dev->ports;
+	int err = 0;
+	int remaining;
+
+	if (!val->value.ports)
+		val->value.ports = malloc(sizeof(struct switch_port) * ports);
+
+	nla_for_each_nested(p, nla, remaining) {
+		struct nlattr *tb[SWITCH_PORT_ATTR_MAX+1];
+		struct switch_port *port;
+
+		if (val->len >= ports)
+			break;
+
+		err = nla_parse_nested(tb, SWITCH_PORT_ATTR_MAX, p, port_policy);
+		if (err < 0)
+			goto out;
+
+		if (!tb[SWITCH_PORT_ID])
+			continue;
+
+		port = &val->value.ports[val->len];
+		port->id = nla_get_u32(tb[SWITCH_PORT_ID]);
+		port->flags = 0;
+		if (tb[SWITCH_PORT_FLAG_TAGGED])
+			port->flags |= SWLIB_PORT_FLAG_TAGGED;
+
+		val->len++;
+	}
+
+out:
+	return err;
+}
+
+static int
+store_val(struct nl_msg *msg, void *arg)
+{
+	struct genlmsghdr *gnlh = nlmsg_data(nlmsg_hdr(msg));
+	struct switch_val *val = arg;
+	struct switch_attr *attr = val->attr;
+
+	if (!val)
+		goto error;
+
+	if (nla_parse(tb, SWITCH_ATTR_MAX - 1, genlmsg_attrdata(gnlh, 0),
+			genlmsg_attrlen(gnlh, 0), NULL) < 0) {
+		goto error;
+	}
+
+	if (tb[SWITCH_ATTR_OP_VALUE_INT])
+		val->value.i = nla_get_u32(tb[SWITCH_ATTR_OP_VALUE_INT]);
+	else if (tb[SWITCH_ATTR_OP_VALUE_STR])
+		val->value.s = strdup(nla_get_string(tb[SWITCH_ATTR_OP_VALUE_STR]));
+	else if (tb[SWITCH_ATTR_OP_VALUE_PORTS])
+		val->err = store_port_val(msg, tb[SWITCH_ATTR_OP_VALUE_PORTS], val);
+
+	val->err = 0;
+	return 0;
+
+error:
+	return NL_SKIP;
+}
+
+int
+swlib_get_attr(struct switch_dev *dev, struct switch_attr *attr, struct switch_val *val)
+{
+	int cmd;
+	int err;
+
+	switch(attr->atype) {
+	case SWLIB_ATTR_GROUP_GLOBAL:
+		cmd = SWITCH_CMD_GET_GLOBAL;
+		break;
+	case SWLIB_ATTR_GROUP_PORT:
+		cmd = SWITCH_CMD_GET_PORT;
+		break;
+	case SWLIB_ATTR_GROUP_VLAN:
+		cmd = SWITCH_CMD_GET_VLAN;
+		break;
+	default:
+		return -EINVAL;
+	}
+
+	memset(&val->value, 0, sizeof(val->value));
+	val->len = 0;
+	val->attr = attr;
+	val->err = -EINVAL;
+	err = swlib_call(cmd, store_val, send_attr, val);
+	if (!err)
+		err = val->err;
+
+	return err;
+}
+
+static int
+send_attr_ports(struct nl_msg *msg, struct switch_val *val)
+{
+	struct nlattr *n;
+	int i;
+
+	/* TODO implement multipart? */
+	if (val->len == 0)
+		goto done;
+	n = nla_nest_start(msg, SWITCH_ATTR_OP_VALUE_PORTS);
+	if (!n)
+		goto nla_put_failure;
+	for (i = 0; i < val->len; i++) {
+		struct switch_port *port = &val->value.ports[i];
+		struct nlattr *np;
+
+		np = nla_nest_start(msg, SWITCH_ATTR_PORT);
+		if (!np)
+			goto nla_put_failure;
+
+		NLA_PUT_U32(msg, SWITCH_PORT_ID, port->id);
+		if (port->flags & SWLIB_PORT_FLAG_TAGGED)
+			NLA_PUT_FLAG(msg, SWITCH_PORT_FLAG_TAGGED);
+
+		nla_nest_end(msg, np);
+	}
+	nla_nest_end(msg, n);
+done:
+	return 0;
+
+nla_put_failure:
+	return -1;
+}
+
+static int
+send_attr_val(struct nl_msg *msg, void *arg)
+{
+	struct switch_val *val = arg;
+	struct switch_attr *attr = val->attr;
+
+	if (send_attr(msg, arg))
+		goto nla_put_failure;
+
+	switch(attr->type) {
+	case SWITCH_TYPE_NOVAL:
+		break;
+	case SWITCH_TYPE_INT:
+		NLA_PUT_U32(msg, SWITCH_ATTR_OP_VALUE_INT, val->value.i);
+		break;
+	case SWITCH_TYPE_STRING:
+		if (!val->value.s)
+			goto nla_put_failure;
+		NLA_PUT_STRING(msg, SWITCH_ATTR_OP_VALUE_STR, val->value.s);
+		break;
+	case SWITCH_TYPE_PORTS:
+		if (send_attr_ports(msg, val) < 0)
+			goto nla_put_failure;
+		break;
+	default:
+		goto nla_put_failure;
+	}
+	return 0;
+
+nla_put_failure:
+	return -1;
+}
+
+int
+swlib_set_attr(struct switch_dev *dev, struct switch_attr *attr, struct switch_val *val)
+{
+	int cmd;
+
+	switch(attr->atype) {
+	case SWLIB_ATTR_GROUP_GLOBAL:
+		cmd = SWITCH_CMD_SET_GLOBAL;
+		break;
+	case SWLIB_ATTR_GROUP_PORT:
+		cmd = SWITCH_CMD_SET_PORT;
+		break;
+	case SWLIB_ATTR_GROUP_VLAN:
+		cmd = SWITCH_CMD_SET_VLAN;
+		break;
+	default:
+		return -EINVAL;
+	}
+
+	val->attr = attr;
+	return swlib_call(cmd, NULL, send_attr_val, val);
+}
+
+int swlib_set_attr_string(struct switch_dev *dev, struct switch_attr *a, int port_vlan, const char *str)
+{
+	struct switch_port *ports;
+	struct switch_val val;
+	char *ptr;
+
+	memset(&val, 0, sizeof(val));
+	val.port_vlan = port_vlan;
+	switch(a->type) {
+	case SWITCH_TYPE_INT:
+		val.value.i = atoi(str);
+		break;
+	case SWITCH_TYPE_STRING:
+		val.value.s = str;
+		break;
+	case SWITCH_TYPE_PORTS:
+		ports = alloca(sizeof(struct switch_port) * dev->ports);
+		memset(ports, 0, sizeof(struct switch_port) * dev->ports);
+		val.len = 0;
+		ptr = (char *)str;
+		while(ptr && *ptr)
+		{
+			while(*ptr && isspace(*ptr))
+				ptr++;
+
+			if (!*ptr)
+				break;
+
+			if (!isdigit(*ptr))
+				return -1;
+
+			if (val.len >= dev->ports)
+				return -1;
+
+			ports[val.len].flags = 0;
+			ports[val.len].id = strtoul(ptr, &ptr, 10);
+			while(*ptr && !isspace(*ptr)) {
+				if (*ptr == 't')
+					ports[val.len].flags |= SWLIB_PORT_FLAG_TAGGED;
+				else
+					return -1;
+
+				ptr++;
+			}
+			if (*ptr)
+				ptr++;
+			val.len++;
+		}
+		val.value.ports = ports;
+		break;
+	case SWITCH_TYPE_NOVAL:
+		if (str && !strcmp(str, "0"))
+			return 0;
+
+		break;
+	default:
+		return -1;
+	}
+	return swlib_set_attr(dev, a, &val);
+}
+
+
+struct attrlist_arg {
+	int id;
+	int atype;
+	struct switch_dev *dev;
+	struct switch_attr *prev;
+	struct switch_attr **head;
+};
+
+static int
+add_id(struct nl_msg *msg, void *arg)
+{
+	struct attrlist_arg *l = arg;
+
+	NLA_PUT_U32(msg, SWITCH_ATTR_ID, l->id);
+
+	return 0;
+nla_put_failure:
+	return -1;
+}
+
+static int
+add_attr(struct nl_msg *msg, void *ptr)
+{
+	struct genlmsghdr *gnlh = nlmsg_data(nlmsg_hdr(msg));
+	struct attrlist_arg *arg = ptr;
+	struct switch_attr *new;
+
+	if (nla_parse(tb, SWITCH_ATTR_MAX - 1, genlmsg_attrdata(gnlh, 0),
+			genlmsg_attrlen(gnlh, 0), NULL) < 0)
+		goto done;
+
+	new = swlib_alloc(sizeof(struct switch_attr));
+	if (!new)
+		goto done;
+
+	new->dev = arg->dev;
+	new->atype = arg->atype;
+	if (arg->prev) {
+		arg->prev->next = new;
+	} else {
+		arg->prev = *arg->head;
+	}
+	*arg->head = new;
+	arg->head = &new->next;
+
+	if (tb[SWITCH_ATTR_OP_ID])
+		new->id = nla_get_u32(tb[SWITCH_ATTR_OP_ID]);
+	if (tb[SWITCH_ATTR_OP_TYPE])
+		new->type = nla_get_u32(tb[SWITCH_ATTR_OP_TYPE]);
+	if (tb[SWITCH_ATTR_OP_NAME])
+		new->name = strdup(nla_get_string(tb[SWITCH_ATTR_OP_NAME]));
+	if (tb[SWITCH_ATTR_OP_DESCRIPTION])
+		new->description = strdup(nla_get_string(tb[SWITCH_ATTR_OP_DESCRIPTION]));
+
+done:
+	return NL_SKIP;
+}
+
+int
+swlib_scan(struct switch_dev *dev)
+{
+	struct attrlist_arg arg;
+
+	if (dev->ops || dev->port_ops || dev->vlan_ops)
+		return 0;
+
+	arg.atype = SWLIB_ATTR_GROUP_GLOBAL;
+	arg.dev = dev;
+	arg.id = dev->id;
+	arg.prev = NULL;
+	arg.head = &dev->ops;
+	swlib_call(SWITCH_CMD_LIST_GLOBAL, add_attr, add_id, &arg);
+
+	arg.atype = SWLIB_ATTR_GROUP_PORT;
+	arg.prev = NULL;
+	arg.head = &dev->port_ops;
+	swlib_call(SWITCH_CMD_LIST_PORT, add_attr, add_id, &arg);
+
+	arg.atype = SWLIB_ATTR_GROUP_VLAN;
+	arg.prev = NULL;
+	arg.head = &dev->vlan_ops;
+	swlib_call(SWITCH_CMD_LIST_VLAN, add_attr, add_id, &arg);
+
+	return 0;
+}
+
+struct switch_attr *swlib_lookup_attr(struct switch_dev *dev,
+		enum swlib_attr_group atype, const char *name)
+{
+	struct switch_attr *head;
+
+	if (!name || !dev)
+		return NULL;
+
+	switch(atype) {
+	case SWLIB_ATTR_GROUP_GLOBAL:
+		head = dev->ops;
+		break;
+	case SWLIB_ATTR_GROUP_PORT:
+		head = dev->port_ops;
+		break;
+	case SWLIB_ATTR_GROUP_VLAN:
+		head = dev->vlan_ops;
+		break;
+	}
+	while(head) {
+		if (!strcmp(name, head->name))
+			return head;
+		head = head->next;
+	}
+
+	return NULL;
+}
+
+static void
+swlib_priv_free(void)
+{
+	if (cache)
+		nl_cache_free(cache);
+	if (handle)
+		nl_handle_destroy(handle);
+	handle = NULL;
+	cache = NULL;
+}
+
+static int
+swlib_priv_init(void)
+{
+	int ret;
+
+	handle = nl_handle_alloc();
+	if (!handle) {
+		DPRINTF("Failed to create handle\n");
+		goto err;
+	}
+
+	if (genl_connect(handle)) {
+		DPRINTF("Failed to connect to generic netlink\n");
+		goto err;
+	}
+
+	cache = genl_ctrl_alloc_cache(handle);
+	if (!cache) {
+		DPRINTF("Failed to allocate netlink cache\n");
+		goto err;
+	}
+
+	family = genl_ctrl_search_by_name(cache, "switch");
+	if (!family) {
+		DPRINTF("Switch API not present\n");
+		goto err;
+	}
+	return 0;
+
+err:
+	swlib_priv_free();
+	return -EINVAL;
+}
+
+struct swlib_scan_arg {
+	const char *name;
+	struct switch_dev *head;
+	struct switch_dev *ptr;
+};
+
+static int
+add_port_map(struct switch_dev *dev, struct nlattr *nla)
+{
+	struct nlattr *p;
+	int err = 0, idx = 0;
+	int remaining;
+
+	dev->maps = malloc(sizeof(struct switch_portmap) * dev->ports);
+	if (!dev->maps)
+		return -1;
+	memset(dev->maps, 0, sizeof(struct switch_portmap) * dev->ports);
+
+	nla_for_each_nested(p, nla, remaining) {
+		struct nlattr *tb[SWITCH_PORTMAP_MAX+1];
+
+		if (idx >= dev->ports)
+			continue;
+
+		err = nla_parse_nested(tb, SWITCH_PORTMAP_MAX, p, portmap_policy);
+		if (err < 0)
+			continue;
+
+
+		if (tb[SWITCH_PORTMAP_SEGMENT] && tb[SWITCH_PORTMAP_VIRT]) {
+			dev->maps[idx].segment = strdup(nla_get_string(tb[SWITCH_PORTMAP_SEGMENT]));
+			dev->maps[idx].virt = nla_get_u32(tb[SWITCH_PORTMAP_VIRT]);
+		}
+		idx++;
+	}
+
+out:
+	return err;
+}
+
+
+static int
+add_switch(struct nl_msg *msg, void *arg)
+{
+	struct swlib_scan_arg *sa = arg;
+	struct genlmsghdr *gnlh = nlmsg_data(nlmsg_hdr(msg));
+	struct switch_dev *dev;
+	const char *name;
+	const char *alias;
+
+	if (nla_parse(tb, SWITCH_ATTR_MAX, genlmsg_attrdata(gnlh, 0), genlmsg_attrlen(gnlh, 0), NULL) < 0)
+		goto done;
+
+	if (!tb[SWITCH_ATTR_DEV_NAME])
+		goto done;
+
+	name = nla_get_string(tb[SWITCH_ATTR_DEV_NAME]);
+	alias = nla_get_string(tb[SWITCH_ATTR_ALIAS]);
+
+	if (sa->name && (strcmp(name, sa->name) != 0) && (strcmp(alias, sa->name) != 0))
+		goto done;
+
+	dev = swlib_alloc(sizeof(struct switch_dev));
+	if (!dev)
+		goto done;
+
+	strncpy(dev->dev_name, name, IFNAMSIZ - 1);
+	dev->alias = strdup(alias);
+	if (tb[SWITCH_ATTR_ID])
+		dev->id = nla_get_u32(tb[SWITCH_ATTR_ID]);
+	if (tb[SWITCH_ATTR_NAME])
+		dev->name = strdup(nla_get_string(tb[SWITCH_ATTR_NAME]));
+	if (tb[SWITCH_ATTR_PORTS])
+		dev->ports = nla_get_u32(tb[SWITCH_ATTR_PORTS]);
+	if (tb[SWITCH_ATTR_VLANS])
+		dev->vlans = nla_get_u32(tb[SWITCH_ATTR_VLANS]);
+	if (tb[SWITCH_ATTR_CPU_PORT])
+		dev->cpu_port = nla_get_u32(tb[SWITCH_ATTR_CPU_PORT]);
+	if (tb[SWITCH_ATTR_PORTMAP])
+		add_port_map(dev, tb[SWITCH_ATTR_PORTMAP]);
+
+	if (!sa->head) {
+		sa->head = dev;
+		sa->ptr = dev;
+	} else {
+		sa->ptr->next = dev;
+		sa->ptr = dev;
+	}
+
+	refcount++;
+done:
+	return NL_SKIP;
+}
+
+static int
+list_switch(struct nl_msg *msg, void *arg)
+{
+	struct swlib_scan_arg *sa = arg;
+	struct genlmsghdr *gnlh = nlmsg_data(nlmsg_hdr(msg));
+	struct switch_dev *dev;
+	const char *name;
+	const char *alias;
+
+	if (nla_parse(tb, SWITCH_ATTR_MAX, genlmsg_attrdata(gnlh, 0), genlmsg_attrlen(gnlh, 0), NULL) < 0)
+		goto done;
+
+	if (!tb[SWITCH_ATTR_DEV_NAME] || !tb[SWITCH_ATTR_NAME])
+		goto done;
+
+	printf("Found: %s - %s\n", nla_get_string(tb[SWITCH_ATTR_DEV_NAME]),
+		nla_get_string(tb[SWITCH_ATTR_ALIAS]));
+
+done:
+	return NL_SKIP;
+}
+
+void
+swlib_list(void)
+{
+	if (swlib_priv_init() < 0)
+		return;
+	swlib_call(SWITCH_CMD_GET_SWITCH, list_switch, NULL, NULL);
+	swlib_priv_free();
+}
+
+void
+swlib_print_portmap(struct switch_dev *dev, char *segment)
+{
+	int i;
+
+	if (segment) {
+		if (!strcmp(segment, "cpu")) {
+			printf("%d ", dev->cpu_port);
+		} else if (!strcmp(segment, "disabled")) {
+			for (i = 0; i < dev->ports; i++)
+				if (!dev->maps[i].segment)
+					printf("%d ", i);
+		} else for (i = 0; i < dev->ports; i++) {
+			if (dev->maps[i].segment && !strcmp(dev->maps[i].segment, segment))
+				printf("%d ", i);
+		}
+	} else {
+		printf("%s - %s\n", dev->dev_name, dev->name);
+		for (i = 0; i < dev->ports; i++)
+			if (i == dev->cpu_port)
+				printf("port%d:\tcpu\n", i);
+			else if (dev->maps[i].segment)
+				printf("port%d:\t%s.%d\n", i, dev->maps[i].segment, dev->maps[i].virt);
+			else
+				printf("port%d:\tdisabled\n", i);
+	}
+}
+
+struct switch_dev *
+swlib_connect(const char *name)
+{
+	struct swlib_scan_arg arg;
+	int err;
+
+	if (!refcount) {
+		if (swlib_priv_init() < 0)
+			return NULL;
+	};
+
+	arg.head = NULL;
+	arg.ptr = NULL;
+	arg.name = name;
+	swlib_call(SWITCH_CMD_GET_SWITCH, add_switch, NULL, &arg);
+
+	if (!refcount)
+		swlib_priv_free();
+
+	return arg.head;
+}
+
+static void
+swlib_free_attributes(struct switch_attr **head)
+{
+	struct switch_attr *a = *head;
+	struct switch_attr *next;
+
+	while (a) {
+		next = a->next;
+		free(a);
+		a = next;
+	}
+	*head = NULL;
+}
+
+void
+swlib_free(struct switch_dev *dev)
+{
+	swlib_free_attributes(&dev->ops);
+	swlib_free_attributes(&dev->port_ops);
+	swlib_free_attributes(&dev->vlan_ops);
+	free(dev);
+
+	if (--refcount == 0)
+		swlib_priv_free();
+}
+
+void
+swlib_free_all(struct switch_dev *dev)
+{
+	struct switch_dev *p;
+
+	while (dev) {
+		p = dev->next;
+		swlib_free(dev);
+		dev = p;
+	}
+}
diff --git a/tools/swconfig/swlib.h b/tools/swconfig/swlib.h
new file mode 100644
index 0000000..dba68b2
--- /dev/null
+++ b/tools/swconfig/swlib.h
@@ -0,0 +1,244 @@
+/*
+ * swlib.h: Switch configuration API (user space part)
+ *
+ * Copyright (C) 2008-2009 Felix Fietkau <nbd@openwrt.org>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public License
+ * version 2.1 as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+
+Usage of the library functions:
+
+  The main datastructure for a switch is the struct switch_device
+  To get started, you first need to use switch_connect() to probe
+  for switches and allocate an instance of this struct.
+
+  There are two possible usage modes:
+    dev = switch_connect("eth0");
+      - this call will look for a switch registered for the linux device
+  	  "eth0" and only allocate a switch_device for this particular switch.
+
+    dev = switch_connect(NULL)
+      - this will return one switch_device struct for each available
+  	  switch. The switch_device structs are chained with by ->next pointer
+
+  Then to query a switch for all available attributes, use:
+    swlib_scan(dev);
+
+  All allocated datastructures for the switch_device struct can be freed with
+    swlib_free(dev);
+  or
+    swlib_free_all(dev);
+
+  The latter traverses a whole chain of switch_device structs and frees them all
+
+  Switch attributes (struct switch_attr) are divided into three groups:
+    dev->ops:
+      - global settings
+    dev->port_ops:
+      - per-port settings
+    dev->vlan_ops:
+      - per-vlan settings
+
+  switch_lookup_attr() is a small helper function to locate attributes
+  by name.
+
+  switch_set_attr() and switch_get_attr() can alter or request the values
+  of attributes.
+
+Usage of the switch_attr struct:
+
+  ->atype: attribute group, one of:
+    - SWLIB_ATTR_GROUP_GLOBAL
+    - SWLIB_ATTR_GROUP_VLAN
+    - SWLIB_ATTR_GROUP_PORT
+
+  ->id: identifier for the attribute
+
+  ->type: data type, one of:
+    - SWITCH_TYPE_INT
+    - SWITCH_TYPE_STRING
+    - SWITCH_TYPE_PORT
+
+  ->name: short name of the attribute
+  ->description: longer description
+  ->next: pointer to the next attribute of the current group
+
+
+Usage of the switch_val struct:
+
+  When setting attributes, following members of the struct switch_val need
+  to be set up:
+
+    ->len (for attr->type == SWITCH_TYPE_PORT)
+    ->port_vlan:
+      - port number (for attr->atype == SWLIB_ATTR_GROUP_PORT), or:
+      - vlan number (for attr->atype == SWLIB_ATTR_GROUP_VLAN)
+    ->value.i (for attr->type == SWITCH_TYPE_INT)
+    ->value.s (for attr->type == SWITCH_TYPE_STRING)
+      - owned by the caller, not stored in the library internally
+    ->value.ports (for attr->type == SWITCH_TYPE_PORT)
+      - must point to an array of at lest val->len * sizeof(struct switch_port)
+
+  When getting string attributes, val->value.s must be freed by the caller
+  When getting port list attributes, an internal static buffer is used,
+  which changes from call to call.
+
+ */
+
+#ifndef __SWLIB_H
+#define __SWLIB_H
+
+enum swlib_attr_group {
+	SWLIB_ATTR_GROUP_GLOBAL,
+	SWLIB_ATTR_GROUP_VLAN,
+	SWLIB_ATTR_GROUP_PORT,
+};
+
+enum swlib_port_flags {
+	SWLIB_PORT_FLAG_TAGGED = (1 << 0),
+};
+
+
+struct switch_dev;
+struct switch_attr;
+struct switch_port;
+struct switch_port_map;
+struct switch_val;
+
+struct switch_dev {
+	int id;
+	char dev_name[IFNAMSIZ];
+	const char *name;
+	const char *alias;
+	int ports;
+	int vlans;
+	int cpu_port;
+	struct switch_attr *ops;
+	struct switch_attr *port_ops;
+	struct switch_attr *vlan_ops;
+	struct switch_portmap *maps;
+	struct switch_dev *next;
+	void *priv;
+};
+
+struct switch_val {
+	struct switch_attr *attr;
+	int len;
+	int err;
+	int port_vlan;
+	union {
+		const char *s;
+		int i;
+		struct switch_port *ports;
+	} value;
+};
+
+struct switch_attr {
+	struct switch_dev *dev;
+	int atype;
+	int id;
+	int type;
+	const char *name;
+	const char *description;
+	struct switch_attr *next;
+};
+
+struct switch_port {
+	unsigned int id;
+	unsigned int flags;
+};
+
+struct switch_portmap {
+	unsigned int virt;
+	const char *segment;
+};
+
+/**
+ * swlib_list: list all switches
+ */
+void swlib_list(void);
+
+/**
+ * swlib_print_portmap: get portmap
+ * @dev: switch device struct
+ */
+void swlib_print_portmap(struct switch_dev *dev, char *segment);
+
+/**
+ * swlib_connect: connect to the switch through netlink
+ * @name: name of the ethernet interface,
+ *
+ * if name is NULL, it connect and builds a chain of all switches
+ */
+struct switch_dev *swlib_connect(const char *name);
+
+/**
+ * swlib_free: free all dynamically allocated data for the switch connection
+ * @dev: switch device struct
+ *
+ * all members of a switch device chain (generated by swlib_connect(NULL))
+ * must be freed individually
+ */
+void swlib_free(struct switch_dev *dev);
+
+/**
+ * swlib_free_all: run swlib_free on all devices in the chain
+ * @dev: switch device struct
+ */
+void swlib_free_all(struct switch_dev *dev);
+
+/**
+ * swlib_scan: probe the switch driver for available commands/attributes
+ * @dev: switch device struct
+ */
+int swlib_scan(struct switch_dev *dev);
+
+/**
+ * swlib_lookup_attr: look up a switch attribute
+ * @dev: switch device struct
+ * @type: global, port or vlan
+ * @name: name of the attribute
+ */
+struct switch_attr *swlib_lookup_attr(struct switch_dev *dev,
+		enum swlib_attr_group atype, const char *name);
+
+/**
+ * swlib_set_attr: set the value for an attribute
+ * @dev: switch device struct
+ * @attr: switch attribute struct
+ * @val: attribute value pointer
+ * returns 0 on success
+ */
+int swlib_set_attr(struct switch_dev *dev, struct switch_attr *attr,
+		struct switch_val *val);
+
+/**
+ * swlib_set_attr_string: set the value for an attribute with type conversion
+ * @dev: switch device struct
+ * @attr: switch attribute struct
+ * @port_vlan: port or vlan (if applicable)
+ * @str: string value
+ * returns 0 on success
+ */
+int swlib_set_attr_string(struct switch_dev *dev, struct switch_attr *attr,
+		int port_vlan, const char *str);
+
+/**
+ * swlib_get_attr: get the value for an attribute
+ * @dev: switch device struct
+ * @attr: switch attribute struct
+ * @val: attribute value pointer
+ * returns 0 on success
+ * for string attributes, the result string must be freed by the caller
+ */
+int swlib_get_attr(struct switch_dev *dev, struct switch_attr *attr,
+		struct switch_val *val);
+
+#endif
-- 
1.8.3.2

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH 3/4 net-next] net: phy: add Broadcom B53 switch driver
  2013-10-22 18:23 [PATCH 0/4 net-next] net: phy: add Generic Netlink switch configuration API Florian Fainelli
  2013-10-22 18:23 ` [PATCH 1/4 net-next] net: phy: add Generic Netlink Ethernet " Florian Fainelli
  2013-10-22 18:23 ` [PATCH 2/4 net-next] tools: add Generic Netlink switch configuration tool Florian Fainelli
@ 2013-10-22 18:23 ` Florian Fainelli
  2013-10-22 18:23 ` [PATCH 4/4 net-next] net: phy: add fake " Florian Fainelli
  3 siblings, 0 replies; 41+ messages in thread
From: Florian Fainelli @ 2013-10-22 18:23 UTC (permalink / raw)
  To: netdev; +Cc: davem, s.hauer, nbd, blogic, jogo, gary, Florian Fainelli

From: Jonas Gorski <jogo@openwrt.org>

This patch add support for the BCM53xx aka RoboSwitch managed Ethernet
switches using the proposed swconfig Generic Netlink API.

This driver supports both SPI and MDIO connected switches and in
particular the following models: BCM5325E, BCM5365, BCM539x,
BCM53115 and BCM53125.

Support for BCM63xx integrated switches as well as BCM53xxx ARM-based
platforms will be submitted as separate driver backends.

Signed-off-by: Jonas Gorski <jogo@openwrt.org>
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
---
 drivers/net/phy/Kconfig             |    2 +
 drivers/net/phy/Makefile            |    1 +
 drivers/net/phy/b53/Kconfig         |   25 +
 drivers/net/phy/b53/Makefile        |    8 +
 drivers/net/phy/b53/b53_common.c    | 1336 +++++++++++++++++++++++++++++++++++
 drivers/net/phy/b53/b53_mdio.c      |  425 +++++++++++
 drivers/net/phy/b53/b53_phy_fixup.c |   55 ++
 drivers/net/phy/b53/b53_priv.h      |  282 ++++++++
 drivers/net/phy/b53/b53_regs.h      |  311 ++++++++
 drivers/net/phy/b53/b53_spi.c       |  329 +++++++++
 include/linux/platform_data/b53.h   |   32 +
 11 files changed, 2806 insertions(+)
 create mode 100644 drivers/net/phy/b53/Kconfig
 create mode 100644 drivers/net/phy/b53/Makefile
 create mode 100644 drivers/net/phy/b53/b53_common.c
 create mode 100644 drivers/net/phy/b53/b53_mdio.c
 create mode 100644 drivers/net/phy/b53/b53_phy_fixup.c
 create mode 100644 drivers/net/phy/b53/b53_priv.h
 create mode 100644 drivers/net/phy/b53/b53_regs.h
 create mode 100644 drivers/net/phy/b53/b53_spi.c
 create mode 100644 include/linux/platform_data/b53.h

diff --git a/drivers/net/phy/Kconfig b/drivers/net/phy/Kconfig
index 9b3e117..d02ed5a 100644
--- a/drivers/net/phy/Kconfig
+++ b/drivers/net/phy/Kconfig
@@ -192,6 +192,8 @@ config MDIO_BUS_MUX_MMIOREG
 
 	  Currently, only 8-bit registers are supported.
 
+source "drivers/net/phy/b53/Kconfig"
+
 endif # PHYLIB
 
 config MICREL_KS8995MA
diff --git a/drivers/net/phy/Makefile b/drivers/net/phy/Makefile
index 268c7de..1998034 100644
--- a/drivers/net/phy/Makefile
+++ b/drivers/net/phy/Makefile
@@ -11,6 +11,7 @@ obj-$(CONFIG_LXT_PHY)		+= lxt.o
 obj-$(CONFIG_QSEMI_PHY)		+= qsemi.o
 obj-$(CONFIG_SMSC_PHY)		+= smsc.o
 obj-$(CONFIG_VITESSE_PHY)	+= vitesse.o
+obj-$(CONFIG_B53)		+= b53/
 obj-$(CONFIG_BROADCOM_PHY)	+= broadcom.o
 obj-$(CONFIG_BCM63XX_PHY)	+= bcm63xx.o
 obj-$(CONFIG_BCM87XX_PHY)	+= bcm87xx.o
diff --git a/drivers/net/phy/b53/Kconfig b/drivers/net/phy/b53/Kconfig
new file mode 100644
index 0000000..4cbcd7e
--- /dev/null
+++ b/drivers/net/phy/b53/Kconfig
@@ -0,0 +1,25 @@
+menuconfig B53
+	tristate "Broadcom bcm53xx managed switch support"
+	depends on SWCONFIG
+	help
+	  This driver adds support for Broadcom managed switch chips. It supports
+	  BCM5325E, BCM5365, BCM539x, BCM53115 and BCM53125 as well as BCM63XX
+	  integrated switches.
+
+config B53_SPI_DRIVER
+	tristate "B53 SPI connected switch driver"
+	depends on B53 && SPI
+	help
+	  Select to enable support for registering switches configured
+	  through SPI.
+
+config B53_PHY_DRIVER
+	tristate "B53 MDIO connected switch driver"
+	depends on B53
+	select B53_PHY_FIXUP
+	help
+	  Select to enable support for registering switches configured
+	  through MDIO.
+
+config B53_PHY_FIXUP
+	bool
diff --git a/drivers/net/phy/b53/Makefile b/drivers/net/phy/b53/Makefile
new file mode 100644
index 0000000..ceb21d4
--- /dev/null
+++ b/drivers/net/phy/b53/Makefile
@@ -0,0 +1,8 @@
+obj-$(CONFIG_B53)		+= b53_common.o
+
+obj-$(CONFIG_B53_PHY_FIXUP)	+= b53_phy_fixup.o
+
+obj-$(CONFIG_B53_PHY_DRIVER)	+= b53_mdio.o
+obj-$(CONFIG_B53_SPI_DRIVER)	+= b53_spi.o
+
+ccflags-y			+= -Werror
diff --git a/drivers/net/phy/b53/b53_common.c b/drivers/net/phy/b53/b53_common.c
new file mode 100644
index 0000000..1f14ab4
--- /dev/null
+++ b/drivers/net/phy/b53/b53_common.c
@@ -0,0 +1,1336 @@
+/*
+ * B53 switch driver main logic
+ *
+ * Copyright (C) 2011-2013 Jonas Gorski <jogo@openwrt.org>
+ *
+ * Permission to use, copy, modify, and/or distribute this software for any
+ * purpose with or without fee is hereby granted, provided that the above
+ * copyright notice and this permission notice appear in all copies.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES
+ * WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF
+ * MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR
+ * ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
+ * WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN
+ * ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF
+ * OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
+ */
+
+#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
+
+#include <linux/delay.h>
+#include <linux/export.h>
+#include <linux/gpio.h>
+#include <linux/kernel.h>
+#include <linux/module.h>
+#include <linux/swconfig.h>
+#include <linux/platform_data/b53.h>
+
+#include "b53_regs.h"
+#include "b53_priv.h"
+
+/* buffer size needed for displaying all MIBs with max'd values */
+#define B53_BUF_SIZE	1188
+
+struct b53_mib_desc {
+	u8 size;
+	u8 offset;
+	const char *name;
+};
+
+
+/* BCM5365 MIB counters */
+static const struct b53_mib_desc b53_mibs_65[] = {
+	{ 8, 0x00, "TxOctets" },
+	{ 4, 0x08, "TxDropPkts" },
+	{ 4, 0x10, "TxBroadcastPkts" },
+	{ 4, 0x14, "TxMulticastPkts" },
+	{ 4, 0x18, "TxUnicastPkts" },
+	{ 4, 0x1c, "TxCollisions" },
+	{ 4, 0x20, "TxSingleCollision" },
+	{ 4, 0x24, "TxMultipleCollision" },
+	{ 4, 0x28, "TxDeferredTransmit" },
+	{ 4, 0x2c, "TxLateCollision" },
+	{ 4, 0x30, "TxExcessiveCollision" },
+	{ 4, 0x38, "TxPausePkts" },
+	{ 8, 0x44, "RxOctets" },
+	{ 4, 0x4c, "RxUndersizePkts" },
+	{ 4, 0x50, "RxPausePkts" },
+	{ 4, 0x54, "Pkts64Octets" },
+	{ 4, 0x58, "Pkts65to127Octets" },
+	{ 4, 0x5c, "Pkts128to255Octets" },
+	{ 4, 0x60, "Pkts256to511Octets" },
+	{ 4, 0x64, "Pkts512to1023Octets" },
+	{ 4, 0x68, "Pkts1024to1522Octets" },
+	{ 4, 0x6c, "RxOversizePkts" },
+	{ 4, 0x70, "RxJabbers" },
+	{ 4, 0x74, "RxAlignmentErrors" },
+	{ 4, 0x78, "RxFCSErrors" },
+	{ 8, 0x7c, "RxGoodOctets" },
+	{ 4, 0x84, "RxDropPkts" },
+	{ 4, 0x88, "RxUnicastPkts" },
+	{ 4, 0x8c, "RxMulticastPkts" },
+	{ 4, 0x90, "RxBroadcastPkts" },
+	{ 4, 0x94, "RxSAChanges" },
+	{ 4, 0x98, "RxFragments" },
+	{ },
+};
+
+/* BCM63xx MIB counters */
+static const struct b53_mib_desc b53_mibs_63xx[] = {
+	{ 8, 0x00, "TxOctets" },
+	{ 4, 0x08, "TxDropPkts" },
+	{ 4, 0x0c, "TxQoSPkts" },
+	{ 4, 0x10, "TxBroadcastPkts" },
+	{ 4, 0x14, "TxMulticastPkts" },
+	{ 4, 0x18, "TxUnicastPkts" },
+	{ 4, 0x1c, "TxCollisions" },
+	{ 4, 0x20, "TxSingleCollision" },
+	{ 4, 0x24, "TxMultipleCollision" },
+	{ 4, 0x28, "TxDeferredTransmit" },
+	{ 4, 0x2c, "TxLateCollision" },
+	{ 4, 0x30, "TxExcessiveCollision" },
+	{ 4, 0x38, "TxPausePkts" },
+	{ 8, 0x3c, "TxQoSOctets" },
+	{ 8, 0x44, "RxOctets" },
+	{ 4, 0x4c, "RxUndersizePkts" },
+	{ 4, 0x50, "RxPausePkts" },
+	{ 4, 0x54, "Pkts64Octets" },
+	{ 4, 0x58, "Pkts65to127Octets" },
+	{ 4, 0x5c, "Pkts128to255Octets" },
+	{ 4, 0x60, "Pkts256to511Octets" },
+	{ 4, 0x64, "Pkts512to1023Octets" },
+	{ 4, 0x68, "Pkts1024to1522Octets" },
+	{ 4, 0x6c, "RxOversizePkts" },
+	{ 4, 0x70, "RxJabbers" },
+	{ 4, 0x74, "RxAlignmentErrors" },
+	{ 4, 0x78, "RxFCSErrors" },
+	{ 8, 0x7c, "RxGoodOctets" },
+	{ 4, 0x84, "RxDropPkts" },
+	{ 4, 0x88, "RxUnicastPkts" },
+	{ 4, 0x8c, "RxMulticastPkts" },
+	{ 4, 0x90, "RxBroadcastPkts" },
+	{ 4, 0x94, "RxSAChanges" },
+	{ 4, 0x98, "RxFragments" },
+	{ 4, 0xa0, "RxSymbolErrors" },
+	{ 4, 0xa4, "RxQoSPkts" },
+	{ 8, 0xa8, "RxQoSOctets" },
+	{ 4, 0xb0, "Pkts1523to2047Octets" },
+	{ 4, 0xb4, "Pkts2048to4095Octets" },
+	{ 4, 0xb8, "Pkts4096to8191Octets" },
+	{ 4, 0xbc, "Pkts8192to9728Octets" },
+	{ 4, 0xc0, "RxDiscarded" },
+	{ }
+};
+
+/* MIB counters */
+static const struct b53_mib_desc b53_mibs[] = {
+	{ 8, 0x00, "TxOctets" },
+	{ 4, 0x08, "TxDropPkts" },
+	{ 4, 0x10, "TxBroadcastPkts" },
+	{ 4, 0x14, "TxMulticastPkts" },
+	{ 4, 0x18, "TxUnicastPkts" },
+	{ 4, 0x1c, "TxCollisions" },
+	{ 4, 0x20, "TxSingleCollision" },
+	{ 4, 0x24, "TxMultipleCollision" },
+	{ 4, 0x28, "TxDeferredTransmit" },
+	{ 4, 0x2c, "TxLateCollision" },
+	{ 4, 0x30, "TxExcessiveCollision" },
+	{ 4, 0x38, "TxPausePkts" },
+	{ 8, 0x50, "RxOctets" },
+	{ 4, 0x58, "RxUndersizePkts" },
+	{ 4, 0x5c, "RxPausePkts" },
+	{ 4, 0x60, "Pkts64Octets" },
+	{ 4, 0x64, "Pkts65to127Octets" },
+	{ 4, 0x68, "Pkts128to255Octets" },
+	{ 4, 0x6c, "Pkts256to511Octets" },
+	{ 4, 0x70, "Pkts512to1023Octets" },
+	{ 4, 0x74, "Pkts1024to1522Octets" },
+	{ 4, 0x78, "RxOversizePkts" },
+	{ 4, 0x7c, "RxJabbers" },
+	{ 4, 0x80, "RxAlignmentErrors" },
+	{ 4, 0x84, "RxFCSErrors" },
+	{ 8, 0x88, "RxGoodOctets" },
+	{ 4, 0x90, "RxDropPkts" },
+	{ 4, 0x94, "RxUnicastPkts" },
+	{ 4, 0x98, "RxMulticastPkts" },
+	{ 4, 0x9c, "RxBroadcastPkts" },
+	{ 4, 0xa0, "RxSAChanges" },
+	{ 4, 0xa4, "RxFragments" },
+	{ 4, 0xa8, "RxJumboPkts" },
+	{ 4, 0xac, "RxSymbolErrors" },
+	{ 4, 0xc0, "RxDiscarded" },
+	{ }
+};
+
+static int b53_do_vlan_op(struct b53_device *dev, u8 op)
+{
+	unsigned int i;
+
+	b53_write8(dev, B53_ARLIO_PAGE, dev->vta_regs[0], VTA_START_CMD | op);
+
+	for (i = 0; i < 10; i++) {
+		u8 vta;
+
+		b53_read8(dev, B53_ARLIO_PAGE, dev->vta_regs[0], &vta);
+		if (!(vta & VTA_START_CMD))
+			return 0;
+
+		usleep_range(100, 200);
+	}
+
+	return -EIO;
+}
+
+static void b53_set_vlan_entry(struct b53_device *dev, u16 vid, u16 members,
+			       u16 untag)
+{
+	if (is5325(dev)) {
+		u32 entry = 0;
+
+		if (members) {
+			entry = (untag << VA_UNTAG_S) | members;
+			if (dev->core_rev >= 3)
+				entry |= VA_VALID_25_R4 | vid << VA_VID_HIGH_S;
+			else
+				entry |= VA_VALID_25;
+		}
+
+		b53_write32(dev, B53_VLAN_PAGE, B53_VLAN_WRITE_25, entry);
+		b53_write16(dev, B53_VLAN_PAGE, B53_VLAN_TABLE_ACCESS_25, vid |
+			    VTA_RW_STATE_WR | VTA_RW_OP_EN);
+	} else if (is5365(dev)) {
+		u16 entry = 0;
+
+		if (members)
+			entry = (untag << VA_UNTAG_S) | members | VA_VALID_65;
+
+		b53_write16(dev, B53_VLAN_PAGE, B53_VLAN_WRITE_65, entry);
+		b53_write16(dev, B53_VLAN_PAGE, B53_VLAN_TABLE_ACCESS_65, vid |
+			    VTA_RW_STATE_WR | VTA_RW_OP_EN);
+	} else {
+		b53_write16(dev, B53_ARLIO_PAGE, dev->vta_regs[1], vid);
+		b53_write32(dev, B53_ARLIO_PAGE, dev->vta_regs[2],
+			    (untag << VTE_UNTAG_S) | members);
+
+		b53_do_vlan_op(dev, VTA_CMD_WRITE);
+	}
+}
+
+void b53_set_forwarding(struct b53_device *dev, int enable)
+{
+	u8 mgmt;
+
+	b53_read8(dev, B53_CTRL_PAGE, B53_SWITCH_MODE, &mgmt);
+
+	if (enable)
+		mgmt |= SM_SW_FWD_EN;
+	else
+		mgmt &= ~SM_SW_FWD_EN;
+
+	b53_write8(dev, B53_CTRL_PAGE, B53_SWITCH_MODE, mgmt);
+}
+
+static void b53_enable_vlan(struct b53_device *dev, int enable)
+{
+	u8 mgmt, vc0, vc1, vc4 = 0, vc5;
+
+	b53_read8(dev, B53_CTRL_PAGE, B53_SWITCH_MODE, &mgmt);
+	b53_read8(dev, B53_VLAN_PAGE, B53_VLAN_CTRL0, &vc0);
+	b53_read8(dev, B53_VLAN_PAGE, B53_VLAN_CTRL1, &vc1);
+
+	if (is5325(dev) || is5365(dev)) {
+		b53_read8(dev, B53_VLAN_PAGE, B53_VLAN_CTRL4_25, &vc4);
+		b53_read8(dev, B53_VLAN_PAGE, B53_VLAN_CTRL5_25, &vc5);
+	} else if (is63xx(dev)) {
+		b53_read8(dev, B53_VLAN_PAGE, B53_VLAN_CTRL4_63XX, &vc4);
+		b53_read8(dev, B53_VLAN_PAGE, B53_VLAN_CTRL5_63XX, &vc5);
+	} else {
+		b53_read8(dev, B53_VLAN_PAGE, B53_VLAN_CTRL4, &vc4);
+		b53_read8(dev, B53_VLAN_PAGE, B53_VLAN_CTRL5, &vc5);
+	}
+
+	mgmt &= ~SM_SW_FWD_MODE;
+
+	if (enable) {
+		vc0 |= VC0_VLAN_EN | VC0_VID_CHK_EN | VC0_VID_HASH_VID;
+		vc1 |= VC1_RX_MCST_UNTAG_EN | VC1_RX_MCST_FWD_EN;
+		vc4 &= ~VC4_ING_VID_CHECK_MASK;
+		vc4 |= VC4_ING_VID_VIO_DROP << VC4_ING_VID_CHECK_S;
+		vc5 |= VC5_DROP_VTABLE_MISS;
+
+		if (is5325(dev))
+			vc0 &= ~VC0_RESERVED_1;
+
+		if (is5325(dev) || is5365(dev))
+			vc1 |= VC1_RX_MCST_TAG_EN;
+
+		if (!is5325(dev) && !is5365(dev)) {
+			if (dev->allow_vid_4095)
+				vc5 |= VC5_VID_FFF_EN;
+			else
+				vc5 &= ~VC5_VID_FFF_EN;
+		}
+	} else {
+		vc0 &= ~(VC0_VLAN_EN | VC0_VID_CHK_EN | VC0_VID_HASH_VID);
+		vc1 &= ~(VC1_RX_MCST_UNTAG_EN | VC1_RX_MCST_FWD_EN);
+		vc4 &= ~VC4_ING_VID_CHECK_MASK;
+		vc5 &= ~VC5_DROP_VTABLE_MISS;
+
+		if (is5325(dev) || is5365(dev))
+			vc4 |= VC4_ING_VID_VIO_FWD << VC4_ING_VID_CHECK_S;
+		else
+			vc4 |= VC4_ING_VID_VIO_TO_IMP << VC4_ING_VID_CHECK_S;
+
+		if (is5325(dev) || is5365(dev))
+			vc1 &= ~VC1_RX_MCST_TAG_EN;
+
+		if (!is5325(dev) && !is5365(dev))
+			vc5 &= ~VC5_VID_FFF_EN;
+	}
+
+	b53_write8(dev, B53_VLAN_PAGE, B53_VLAN_CTRL0, vc0);
+	b53_write8(dev, B53_VLAN_PAGE, B53_VLAN_CTRL1, vc1);
+
+	if (is5325(dev) || is5365(dev)) {
+		/* enable the high 8 bit vid check on 5325 */
+		if (is5325(dev) && enable)
+			b53_write8(dev, B53_VLAN_PAGE, B53_VLAN_CTRL3,
+				   VC3_HIGH_8BIT_EN);
+		else
+			b53_write8(dev, B53_VLAN_PAGE, B53_VLAN_CTRL3, 0);
+
+		b53_write8(dev, B53_VLAN_PAGE, B53_VLAN_CTRL4_25, vc4);
+		b53_write8(dev, B53_VLAN_PAGE, B53_VLAN_CTRL5_25, vc5);
+	} else if (is63xx(dev)) {
+		b53_write16(dev, B53_VLAN_PAGE, B53_VLAN_CTRL3_63XX, 0);
+		b53_write8(dev, B53_VLAN_PAGE, B53_VLAN_CTRL4_63XX, vc4);
+		b53_write8(dev, B53_VLAN_PAGE, B53_VLAN_CTRL5_63XX, vc5);
+	} else {
+		b53_write16(dev, B53_VLAN_PAGE, B53_VLAN_CTRL3, 0);
+		b53_write8(dev, B53_VLAN_PAGE, B53_VLAN_CTRL4, vc4);
+		b53_write8(dev, B53_VLAN_PAGE, B53_VLAN_CTRL5, vc5);
+	}
+
+	b53_write8(dev, B53_CTRL_PAGE, B53_SWITCH_MODE, mgmt);
+}
+
+static int b53_set_jumbo(struct b53_device *dev, int enable, int allow_10_100)
+{
+	u32 port_mask = 0;
+	u16 max_size = JMS_MIN_SIZE;
+
+	if (is5325(dev) || is5365(dev))
+		return -EINVAL;
+
+	if (enable) {
+		port_mask = dev->enabled_ports;
+		max_size = JMS_MAX_SIZE;
+		if (allow_10_100)
+			port_mask |= JPM_10_100_JUMBO_EN;
+	}
+
+	b53_write32(dev, B53_JUMBO_PAGE, dev->jumbo_pm_reg, port_mask);
+	return b53_write16(dev, B53_JUMBO_PAGE, dev->jumbo_size_reg, max_size);
+}
+
+static int b53_flush_arl(struct b53_device *dev)
+{
+	unsigned int i;
+
+	b53_write8(dev, B53_CTRL_PAGE, B53_FAST_AGE_CTRL,
+		   FAST_AGE_DONE | FAST_AGE_DYNAMIC | FAST_AGE_STATIC);
+
+	for (i = 0; i < 10; i++) {
+		u8 fast_age_ctrl;
+
+		b53_read8(dev, B53_CTRL_PAGE, B53_FAST_AGE_CTRL,
+			  &fast_age_ctrl);
+
+		if (!(fast_age_ctrl & FAST_AGE_DONE))
+			return 0;
+
+		mdelay(1);
+	}
+
+	pr_warn("time out while flushing ARL\n");
+
+	return -EINVAL;
+}
+
+static void b53_enable_ports(struct b53_device *dev)
+{
+	unsigned i;
+
+	b53_for_each_port(dev, i) {
+		u8 port_ctrl;
+		u16 pvlan_mask;
+
+		/*
+		 * prevent leaking packets between wan and lan in unmanaged
+		 * mode through port vlans.
+		 */
+		if (dev->enable_vlan || is_cpu_port(dev, i))
+			pvlan_mask = 0x1ff;
+		else if (is531x5(dev))
+			/* BCM53115 may use a different port as cpu port */
+			pvlan_mask = BIT(dev->sw_dev.cpu_port);
+		else
+			pvlan_mask = BIT(B53_CPU_PORT);
+
+		/* BCM5325 CPU port is at 8 */
+		if ((is5325(dev) || is5365(dev)) && i == B53_CPU_PORT_25)
+			i = B53_CPU_PORT;
+
+		if (dev->chip_id == BCM5398_DEVICE_ID && (i == 6 || i == 7))
+			/* disable unused ports 6 & 7 */
+			port_ctrl = PORT_CTRL_RX_DISABLE | PORT_CTRL_TX_DISABLE;
+		else if (i == B53_CPU_PORT)
+			port_ctrl = PORT_CTRL_RX_BCST_EN |
+				    PORT_CTRL_RX_MCST_EN |
+				    PORT_CTRL_RX_UCST_EN;
+		else
+			port_ctrl = 0;
+
+		b53_write16(dev, B53_PVLAN_PAGE, B53_PVLAN_PORT_MASK(i),
+			    pvlan_mask);
+
+		/* port state is handled by bcm63xx_enet driver */
+		if (!is63xx(dev))
+			b53_write8(dev, B53_CTRL_PAGE, B53_PORT_CTRL(i),
+				   port_ctrl);
+	}
+}
+
+static void b53_enable_mib(struct b53_device *dev)
+{
+	u8 gc;
+
+	b53_read8(dev, B53_CTRL_PAGE, B53_GLOBAL_CONFIG, &gc);
+
+	gc &= ~(GC_RESET_MIB | GC_MIB_AC_EN);
+
+	b53_write8(dev, B53_CTRL_PAGE, B53_GLOBAL_CONFIG, gc);
+}
+
+static int b53_apply(struct b53_device *dev)
+{
+	int i;
+
+	/* clear all vlan entries */
+	if (is5325(dev) || is5365(dev)) {
+		for (i = 1; i < dev->sw_dev.vlans; i++)
+			b53_set_vlan_entry(dev, i, 0, 0);
+	} else {
+		b53_do_vlan_op(dev, VTA_CMD_CLEAR);
+	}
+
+	b53_enable_vlan(dev, dev->enable_vlan);
+
+	/* fill VLAN table */
+	if (dev->enable_vlan) {
+		for (i = 0; i < dev->sw_dev.vlans; i++) {
+			struct b53_vlan *vlan = &dev->vlans[i];
+
+			if (!vlan->members)
+				continue;
+
+			b53_set_vlan_entry(dev, i, vlan->members, vlan->untag);
+		}
+
+		b53_for_each_port(dev, i)
+			b53_write16(dev, B53_VLAN_PAGE,
+				    B53_VLAN_PORT_DEF_TAG(i),
+				    dev->ports[i].pvid);
+	} else {
+		b53_for_each_port(dev, i)
+			b53_write16(dev, B53_VLAN_PAGE,
+				    B53_VLAN_PORT_DEF_TAG(i), 1);
+
+	}
+
+	b53_enable_ports(dev);
+
+	if (!is5325(dev) && !is5365(dev))
+		b53_set_jumbo(dev, dev->enable_jumbo, 1);
+
+	return 0;
+}
+
+void b53_switch_reset_gpio(struct b53_device *dev)
+{
+	int gpio = dev->reset_gpio;
+
+	if (gpio < 0)
+		return;
+
+	/*
+	 * Reset sequence: RESET low(50ms)->high(20ms)
+	 */
+	gpio_set_value(gpio, 0);
+	mdelay(50);
+
+	gpio_set_value(gpio, 1);
+	mdelay(20);
+
+	dev->current_page = 0xff;
+}
+
+static int b53_switch_reset(struct b53_device *dev)
+{
+	u8 mgmt;
+
+	b53_switch_reset_gpio(dev);
+
+	if (is539x(dev)) {
+		b53_write8(dev, B53_CTRL_PAGE, B53_SOFTRESET, 0x83);
+		b53_write8(dev, B53_CTRL_PAGE, B53_SOFTRESET, 0x00);
+	}
+
+	b53_read8(dev, B53_CTRL_PAGE, B53_SWITCH_MODE, &mgmt);
+
+	if (!(mgmt & SM_SW_FWD_EN)) {
+		mgmt &= ~SM_SW_FWD_MODE;
+		mgmt |= SM_SW_FWD_EN;
+
+		b53_write8(dev, B53_CTRL_PAGE, B53_SWITCH_MODE, mgmt);
+		b53_read8(dev, B53_CTRL_PAGE, B53_SWITCH_MODE, &mgmt);
+
+		if (!(mgmt & SM_SW_FWD_EN)) {
+			pr_err("Failed to enable switch!\n");
+			return -EINVAL;
+		}
+	}
+
+	/* enable all ports */
+	b53_enable_ports(dev);
+
+	/* configure MII port if necessary */
+	if (is5325(dev)) {
+		u8 mii_port_override;
+
+		b53_read8(dev, B53_CTRL_PAGE, B53_PORT_OVERRIDE_CTRL,
+			  &mii_port_override);
+		/* reverse mii needs to be enabled */
+		if (!(mii_port_override & PORT_OVERRIDE_RV_MII_25)) {
+			b53_write8(dev, B53_CTRL_PAGE, B53_PORT_OVERRIDE_CTRL,
+				   mii_port_override | PORT_OVERRIDE_RV_MII_25);
+			b53_read8(dev, B53_CTRL_PAGE, B53_PORT_OVERRIDE_CTRL,
+				  &mii_port_override);
+
+			if (!(mii_port_override & PORT_OVERRIDE_RV_MII_25)) {
+				pr_err("Failed to enable reverse MII mode\n");
+				return -EINVAL;
+			}
+		}
+	} else if (is531x5(dev) && dev->sw_dev.cpu_port == B53_CPU_PORT) {
+		u8 mii_port_override;
+
+		b53_read8(dev, B53_CTRL_PAGE, B53_PORT_OVERRIDE_CTRL,
+			  &mii_port_override);
+		b53_write8(dev, B53_CTRL_PAGE, B53_PORT_OVERRIDE_CTRL,
+			   mii_port_override | PORT_OVERRIDE_EN |
+			   PORT_OVERRIDE_LINK);
+	}
+
+	b53_enable_mib(dev);
+
+	return b53_flush_arl(dev);
+}
+
+/*
+ * Swconfig glue functions
+ */
+
+static int b53_global_get_vlan_enable(struct switch_dev *dev,
+				      const struct switch_attr *attr,
+				      struct switch_val *val)
+{
+	struct b53_device *priv = sw_to_b53(dev);
+
+	val->value.i = priv->enable_vlan;
+
+	return 0;
+}
+
+static int b53_global_set_vlan_enable(struct switch_dev *dev,
+				      const struct switch_attr *attr,
+				      struct switch_val *val)
+{
+	struct b53_device *priv = sw_to_b53(dev);
+
+	priv->enable_vlan = val->value.i;
+
+	return 0;
+}
+
+static int b53_global_get_jumbo_enable(struct switch_dev *dev,
+				       const struct switch_attr *attr,
+				       struct switch_val *val)
+{
+	struct b53_device *priv = sw_to_b53(dev);
+
+	val->value.i = priv->enable_jumbo;
+
+	return 0;
+}
+
+static int b53_global_set_jumbo_enable(struct switch_dev *dev,
+				       const struct switch_attr *attr,
+				       struct switch_val *val)
+{
+	struct b53_device *priv = sw_to_b53(dev);
+
+	priv->enable_jumbo = val->value.i;
+
+	return 0;
+}
+
+static int b53_global_get_4095_enable(struct switch_dev *dev,
+				      const struct switch_attr *attr,
+				      struct switch_val *val)
+{
+	struct b53_device *priv = sw_to_b53(dev);
+
+	val->value.i = priv->allow_vid_4095;
+
+	return 0;
+}
+
+static int b53_global_set_4095_enable(struct switch_dev *dev,
+				      const struct switch_attr *attr,
+				      struct switch_val *val)
+{
+	struct b53_device *priv = sw_to_b53(dev);
+
+	priv->allow_vid_4095 = val->value.i;
+
+	return 0;
+}
+
+static int b53_global_get_ports(struct switch_dev *dev,
+				const struct switch_attr *attr,
+				struct switch_val *val)
+{
+	struct b53_device *priv = sw_to_b53(dev);
+
+	val->len = snprintf(priv->buf, B53_BUF_SIZE, "0x%04x",
+			    priv->enabled_ports);
+	val->value.s = priv->buf;
+
+	return 0;
+}
+
+static int b53_port_get_pvid(struct switch_dev *dev, int port, int *val)
+{
+	struct b53_device *priv = sw_to_b53(dev);
+
+	*val = priv->ports[port].pvid;
+
+	return 0;
+}
+
+static int b53_port_set_pvid(struct switch_dev *dev, int port, int val)
+{
+	struct b53_device *priv = sw_to_b53(dev);
+
+	if (val > 15 && is5325(priv))
+		return -EINVAL;
+	if (val == 4095 && !priv->allow_vid_4095)
+		return -EINVAL;
+
+	priv->ports[port].pvid = val;
+
+	return 0;
+}
+
+static int b53_vlan_get_ports(struct switch_dev *dev, struct switch_val *val)
+{
+	struct b53_device *priv = sw_to_b53(dev);
+	struct switch_port *port = &val->value.ports[0];
+	struct b53_vlan *vlan = &priv->vlans[val->port_vlan];
+	int i;
+
+	val->len = 0;
+
+	if (!vlan->members)
+		return 0;
+
+	for (i = 0; i < dev->ports; i++) {
+		if (!(vlan->members & BIT(i)))
+			continue;
+
+
+		if (!(vlan->untag & BIT(i)))
+			port->flags = BIT(SWITCH_PORT_FLAG_TAGGED);
+		else
+			port->flags = 0;
+
+		port->id = i;
+		val->len++;
+		port++;
+	}
+
+	return 0;
+}
+
+static int b53_vlan_set_ports(struct switch_dev *dev, struct switch_val *val)
+{
+	struct b53_device *priv = sw_to_b53(dev);
+	struct switch_port *port;
+	struct b53_vlan *vlan = &priv->vlans[val->port_vlan];
+	int i;
+
+	/* only BCM5325 and BCM5365 supports VID 0 */
+	if (val->port_vlan == 0 && !is5325(priv) && !is5365(priv))
+		return -EINVAL;
+
+	/* VLAN 4095 needs special handling */
+	if (val->port_vlan == 4095 && !priv->allow_vid_4095)
+		return -EINVAL;
+
+	port = &val->value.ports[0];
+	vlan->members = 0;
+	vlan->untag = 0;
+	for (i = 0; i < val->len; i++, port++) {
+		vlan->members |= BIT(port->id);
+
+		if (!(port->flags & BIT(SWITCH_PORT_FLAG_TAGGED))) {
+			vlan->untag |= BIT(port->id);
+			priv->ports[port->id].pvid = val->port_vlan;
+		};
+	}
+
+	/* ignore disabled ports */
+	vlan->members &= priv->enabled_ports;
+	vlan->untag &= priv->enabled_ports;
+
+	return 0;
+}
+
+static int b53_port_get_link(struct switch_dev *dev, int port,
+			     struct switch_port_link *link)
+{
+	struct b53_device *priv = sw_to_b53(dev);
+
+	if (is_cpu_port(priv, port)) {
+		link->link = 1;
+		link->duplex = 1;
+		link->speed = is5325(priv) || is5365(priv) ?
+				SWITCH_PORT_SPEED_100 : SWITCH_PORT_SPEED_1000;
+		link->aneg = 0;
+	} else if (priv->enabled_ports & BIT(port)) {
+		u32 speed;
+		u16 lnk, duplex;
+
+		b53_read16(priv, B53_STAT_PAGE, B53_LINK_STAT, &lnk);
+		b53_read16(priv, B53_STAT_PAGE, priv->duplex_reg, &duplex);
+
+		lnk = (lnk >> port) & 1;
+		duplex = (duplex >> port) & 1;
+
+		if (is5325(priv) || is5365(priv)) {
+			u16 tmp;
+
+			b53_read16(priv, B53_STAT_PAGE, B53_SPEED_STAT, &tmp);
+			speed = SPEED_PORT_FE(tmp, port);
+		} else {
+			b53_read32(priv, B53_STAT_PAGE, B53_SPEED_STAT, &speed);
+			speed = SPEED_PORT_GE(speed, port);
+		}
+
+		link->link = lnk;
+		if (lnk) {
+			link->duplex = duplex;
+			switch (speed) {
+			case SPEED_STAT_10M:
+				link->speed = SWITCH_PORT_SPEED_10;
+				break;
+			case SPEED_STAT_100M:
+				link->speed = SWITCH_PORT_SPEED_100;
+				break;
+			case SPEED_STAT_1000M:
+				link->speed = SWITCH_PORT_SPEED_1000;
+				break;
+			}
+		}
+
+		link->aneg = 1;
+	} else {
+		link->link = 0;
+	}
+
+	return 0;
+
+}
+
+static int b53_global_reset_switch(struct switch_dev *dev)
+{
+	struct b53_device *priv = sw_to_b53(dev);
+
+	/* reset vlans */
+	priv->enable_vlan = 0;
+	priv->enable_jumbo = 0;
+	priv->allow_vid_4095 = 0;
+
+	memset(priv->vlans, 0, sizeof(priv->vlans) * dev->vlans);
+	memset(priv->ports, 0, sizeof(priv->ports) * dev->ports);
+
+	return b53_switch_reset(priv);
+}
+
+static int b53_global_apply_config(struct switch_dev *dev)
+{
+	struct b53_device *priv = sw_to_b53(dev);
+
+	/* disable switching */
+	b53_set_forwarding(priv, 0);
+
+	b53_apply(priv);
+
+	/* enable switching */
+	b53_set_forwarding(priv, 1);
+
+	return 0;
+}
+
+
+static int b53_global_reset_mib(struct switch_dev *dev,
+				const struct switch_attr *attr,
+				struct switch_val *val)
+{
+	struct b53_device *priv = sw_to_b53(dev);
+	u8 gc;
+
+	b53_read8(priv, B53_MGMT_PAGE, B53_GLOBAL_CONFIG, &gc);
+
+	b53_write8(priv, B53_MGMT_PAGE, B53_GLOBAL_CONFIG, gc | GC_RESET_MIB);
+	mdelay(1);
+	b53_write8(priv, B53_MGMT_PAGE, B53_GLOBAL_CONFIG, gc & ~GC_RESET_MIB);
+	mdelay(1);
+
+	return 0;
+}
+
+static int b53_port_get_mib(struct switch_dev *sw_dev,
+			    const struct switch_attr *attr,
+			    struct switch_val *val)
+{
+	struct b53_device *dev = sw_to_b53(sw_dev);
+	const struct b53_mib_desc *mibs;
+	int port = val->port_vlan;
+	int len = 0;
+
+	if (!(BIT(port) & dev->enabled_ports))
+		return -1;
+
+	if (is5365(dev)) {
+		if (port == 5)
+			port = 8;
+
+		mibs = b53_mibs_65;
+	} else if (is63xx(dev)) {
+		mibs = b53_mibs_63xx;
+	} else {
+		mibs = b53_mibs;
+	}
+
+	dev->buf[0] = 0;
+
+	for (; mibs->size > 0; mibs++) {
+		u64 val;
+
+		if (mibs->size == 8) {
+			b53_read64(dev, B53_MIB_PAGE(port), mibs->offset, &val);
+		} else {
+			u32 val32;
+
+			b53_read32(dev, B53_MIB_PAGE(port), mibs->offset,
+				   &val32);
+			val = val32;
+		}
+
+		len += snprintf(dev->buf + len, B53_BUF_SIZE - len,
+				"%-20s: %llu\n", mibs->name, val);
+	}
+
+	val->len = len;
+	val->value.s = dev->buf;
+
+	return 0;
+}
+
+static struct switch_attr b53_global_ops_25[] = {
+	{
+		.type = SWITCH_TYPE_INT,
+		.name = "enable_vlan",
+		.description = "Enable VLAN mode",
+		.set = b53_global_set_vlan_enable,
+		.get = b53_global_get_vlan_enable,
+		.max = 1,
+	},
+	{
+		.type = SWITCH_TYPE_STRING,
+		.name = "ports",
+		.description = "Available ports (as bitmask)",
+		.get = b53_global_get_ports,
+	},
+};
+
+static struct switch_attr b53_global_ops_65[] = {
+	{
+		.type = SWITCH_TYPE_INT,
+		.name = "enable_vlan",
+		.description = "Enable VLAN mode",
+		.set = b53_global_set_vlan_enable,
+		.get = b53_global_get_vlan_enable,
+		.max = 1,
+	},
+	{
+		.type = SWITCH_TYPE_STRING,
+		.name = "ports",
+		.description = "Available ports (as bitmask)",
+		.get = b53_global_get_ports,
+	},
+	{
+		.type = SWITCH_TYPE_INT,
+		.name = "reset_mib",
+		.description = "Reset MIB counters",
+		.set = b53_global_reset_mib,
+	},
+};
+
+static struct switch_attr b53_global_ops[] = {
+	{
+		.type = SWITCH_TYPE_INT,
+		.name = "enable_vlan",
+		.description = "Enable VLAN mode",
+		.set = b53_global_set_vlan_enable,
+		.get = b53_global_get_vlan_enable,
+		.max = 1,
+	},
+	{
+		.type = SWITCH_TYPE_STRING,
+		.name = "ports",
+		.description = "Available Ports (as bitmask)",
+		.get = b53_global_get_ports,
+	},
+	{
+		.type = SWITCH_TYPE_INT,
+		.name = "reset_mib",
+		.description = "Reset MIB counters",
+		.set = b53_global_reset_mib,
+	},
+	{
+		.type = SWITCH_TYPE_INT,
+		.name = "enable_jumbo",
+		.description = "Enable Jumbo Frames",
+		.set = b53_global_set_jumbo_enable,
+		.get = b53_global_get_jumbo_enable,
+		.max = 1,
+	},
+	{
+		.type = SWITCH_TYPE_INT,
+		.name = "allow_vid_4095",
+		.description = "Allow VID 4095",
+		.set = b53_global_set_4095_enable,
+		.get = b53_global_get_4095_enable,
+		.max = 1,
+	},
+};
+
+static struct switch_attr b53_port_ops[] = {
+	{
+		.type = SWITCH_TYPE_STRING,
+		.name = "mib",
+		.description = "Get port's MIB counters",
+		.get = b53_port_get_mib,
+	},
+};
+
+static struct switch_attr b53_no_ops[] = {
+};
+
+static const struct switch_dev_ops b53_switch_ops_25 = {
+	.attr_global = {
+		.attr = b53_global_ops_25,
+		.n_attr = ARRAY_SIZE(b53_global_ops_25),
+	},
+	.attr_port = {
+		.attr = b53_no_ops,
+		.n_attr = ARRAY_SIZE(b53_no_ops),
+	},
+	.attr_vlan = {
+		.attr = b53_no_ops,
+		.n_attr = ARRAY_SIZE(b53_no_ops),
+	},
+
+	.get_vlan_ports = b53_vlan_get_ports,
+	.set_vlan_ports = b53_vlan_set_ports,
+	.get_port_pvid = b53_port_get_pvid,
+	.set_port_pvid = b53_port_set_pvid,
+	.apply_config = b53_global_apply_config,
+	.reset_switch = b53_global_reset_switch,
+	.get_port_link = b53_port_get_link,
+};
+
+static const struct switch_dev_ops b53_switch_ops_65 = {
+	.attr_global = {
+		.attr = b53_global_ops_65,
+		.n_attr = ARRAY_SIZE(b53_global_ops_65),
+	},
+	.attr_port = {
+		.attr = b53_port_ops,
+		.n_attr = ARRAY_SIZE(b53_port_ops),
+	},
+	.attr_vlan = {
+		.attr = b53_no_ops,
+		.n_attr = ARRAY_SIZE(b53_no_ops),
+	},
+
+	.get_vlan_ports = b53_vlan_get_ports,
+	.set_vlan_ports = b53_vlan_set_ports,
+	.get_port_pvid = b53_port_get_pvid,
+	.set_port_pvid = b53_port_set_pvid,
+	.apply_config = b53_global_apply_config,
+	.reset_switch = b53_global_reset_switch,
+	.get_port_link = b53_port_get_link,
+};
+
+static const struct switch_dev_ops b53_switch_ops = {
+	.attr_global = {
+		.attr = b53_global_ops,
+		.n_attr = ARRAY_SIZE(b53_global_ops),
+	},
+	.attr_port = {
+		.attr = b53_port_ops,
+		.n_attr = ARRAY_SIZE(b53_port_ops),
+	},
+	.attr_vlan = {
+		.attr = b53_no_ops,
+		.n_attr = ARRAY_SIZE(b53_no_ops),
+	},
+
+	.get_vlan_ports = b53_vlan_get_ports,
+	.set_vlan_ports = b53_vlan_set_ports,
+	.get_port_pvid = b53_port_get_pvid,
+	.set_port_pvid = b53_port_set_pvid,
+	.apply_config = b53_global_apply_config,
+	.reset_switch = b53_global_reset_switch,
+	.get_port_link = b53_port_get_link,
+};
+
+struct b53_chip_data {
+	u32 chip_id;
+	const char *dev_name;
+	const char *alias;
+	u16 vlans;
+	u16 enabled_ports;
+	u8 cpu_port;
+	u8 vta_regs[3];
+	u8 duplex_reg;
+	u8 jumbo_pm_reg;
+	u8 jumbo_size_reg;
+	const struct switch_dev_ops *sw_ops;
+};
+
+#define B53_VTA_REGS	\
+	{ B53_VT_ACCESS, B53_VT_INDEX, B53_VT_ENTRY }
+#define B53_VTA_REGS_9798 \
+	{ B53_VT_ACCESS_9798, B53_VT_INDEX_9798, B53_VT_ENTRY_9798 }
+#define B53_VTA_REGS_63XX \
+	{ B53_VT_ACCESS_63XX, B53_VT_INDEX_63XX, B53_VT_ENTRY_63XX }
+
+static const struct b53_chip_data b53_switch_chips[] = {
+	{
+		.chip_id = BCM5325_DEVICE_ID,
+		.dev_name = "BCM5325",
+		.alias = "bcm5325",
+		.vlans = 16,
+		.enabled_ports = 0x1f,
+		.cpu_port = B53_CPU_PORT_25,
+		.duplex_reg = B53_DUPLEX_STAT_FE,
+		.sw_ops = &b53_switch_ops_25,
+	},
+	{
+		.chip_id = BCM5365_DEVICE_ID,
+		.dev_name = "BCM5365",
+		.alias = "bcm5365",
+		.vlans = 256,
+		.enabled_ports = 0x1f,
+		.cpu_port = B53_CPU_PORT_25,
+		.duplex_reg = B53_DUPLEX_STAT_FE,
+		.sw_ops = &b53_switch_ops_65,
+	},
+	{
+		.chip_id = BCM5395_DEVICE_ID,
+		.dev_name = "BCM5395",
+		.alias = "bcm5395",
+		.vlans = 4096,
+		.enabled_ports = 0x1f,
+		.cpu_port = B53_CPU_PORT,
+		.vta_regs = B53_VTA_REGS,
+		.duplex_reg = B53_DUPLEX_STAT_GE,
+		.jumbo_pm_reg = B53_JUMBO_PORT_MASK,
+		.jumbo_size_reg = B53_JUMBO_MAX_SIZE,
+		.sw_ops = &b53_switch_ops,
+	},
+	{
+		.chip_id = BCM5397_DEVICE_ID,
+		.dev_name = "BCM5397",
+		.alias = "bcm5397",
+		.vlans = 4096,
+		.enabled_ports = 0x1f,
+		.cpu_port = B53_CPU_PORT,
+		.vta_regs = B53_VTA_REGS_9798,
+		.duplex_reg = B53_DUPLEX_STAT_GE,
+		.jumbo_pm_reg = B53_JUMBO_PORT_MASK,
+		.jumbo_size_reg = B53_JUMBO_MAX_SIZE,
+		.sw_ops = &b53_switch_ops,
+	},
+	{
+		.chip_id = BCM5398_DEVICE_ID,
+		.dev_name = "BCM5398",
+		.alias = "bcm5398",
+		.vlans = 4096,
+		.enabled_ports = 0x7f,
+		.cpu_port = B53_CPU_PORT,
+		.vta_regs = B53_VTA_REGS_9798,
+		.duplex_reg = B53_DUPLEX_STAT_GE,
+		.jumbo_pm_reg = B53_JUMBO_PORT_MASK,
+		.jumbo_size_reg = B53_JUMBO_MAX_SIZE,
+		.sw_ops = &b53_switch_ops,
+	},
+	{
+		.chip_id = BCM53115_DEVICE_ID,
+		.dev_name = "BCM53115",
+		.alias = "bcm53115",
+		.vlans = 4096,
+		.enabled_ports = 0x1f,
+		.vta_regs = B53_VTA_REGS,
+		.cpu_port = B53_CPU_PORT,
+		.duplex_reg = B53_DUPLEX_STAT_GE,
+		.jumbo_pm_reg = B53_JUMBO_PORT_MASK,
+		.jumbo_size_reg = B53_JUMBO_MAX_SIZE,
+		.sw_ops = &b53_switch_ops,
+	},
+	{
+		.chip_id = BCM53125_DEVICE_ID,
+		.dev_name = "BCM53125",
+		.alias = "bcm53125",
+		.vlans = 4096,
+		.enabled_ports = 0x1f,
+		.cpu_port = B53_CPU_PORT,
+		.vta_regs = B53_VTA_REGS,
+		.duplex_reg = B53_DUPLEX_STAT_GE,
+		.jumbo_pm_reg = B53_JUMBO_PORT_MASK,
+		.jumbo_size_reg = B53_JUMBO_MAX_SIZE,
+		.sw_ops = &b53_switch_ops,
+	},
+	{
+		.chip_id = BCM63XX_DEVICE_ID,
+		.dev_name = "BCM63xx",
+		.alias = "bcm63xx",
+		.vlans = 4096,
+		.enabled_ports = 0, /* pdata must provide them */
+		.cpu_port = B53_CPU_PORT,
+		.vta_regs = B53_VTA_REGS_63XX,
+		.duplex_reg = B53_DUPLEX_STAT_63XX,
+		.jumbo_pm_reg = B53_JUMBO_PORT_MASK_63XX,
+		.jumbo_size_reg = B53_JUMBO_MAX_SIZE_63XX,
+		.sw_ops = &b53_switch_ops,
+	},
+};
+
+int b53_switch_init(struct b53_device *dev)
+{
+	struct switch_dev *sw_dev = &dev->sw_dev;
+	unsigned i;
+	int ret;
+
+	for (i = 0; i < ARRAY_SIZE(b53_switch_chips); i++) {
+		const struct b53_chip_data *chip = &b53_switch_chips[i];
+
+		if (chip->chip_id == dev->chip_id) {
+			sw_dev->name = chip->dev_name;
+			if (!sw_dev->alias)
+				sw_dev->alias = chip->alias;
+			if (!dev->enabled_ports)
+				dev->enabled_ports = chip->enabled_ports;
+			dev->duplex_reg = chip->duplex_reg;
+			dev->vta_regs[0] = chip->vta_regs[0];
+			dev->vta_regs[1] = chip->vta_regs[1];
+			dev->vta_regs[2] = chip->vta_regs[2];
+			dev->jumbo_pm_reg = chip->jumbo_pm_reg;
+			sw_dev->ops = chip->sw_ops;
+			sw_dev->cpu_port = chip->cpu_port;
+			sw_dev->vlans = chip->vlans;
+			break;
+		}
+	}
+
+	if (!sw_dev->name)
+		return -EINVAL;
+
+	/* check which BCM5325x version we have */
+	if (is5325(dev)) {
+		u8 vc4;
+
+		b53_read8(dev, B53_VLAN_PAGE, B53_VLAN_CTRL4_25, &vc4);
+
+		/* check reserved bits */
+		switch (vc4 & 3) {
+		case 1:
+			/* BCM5325E */
+			break;
+		case 3:
+			/* BCM5325F - do not use port 4 */
+			dev->enabled_ports &= ~BIT(4);
+			break;
+		default:
+			break;
+		}
+	} else if (dev->chip_id == BCM53115_DEVICE_ID) {
+		u64 strap_value;
+
+		b53_read48(dev, B53_STAT_PAGE, B53_STRAP_VALUE, &strap_value);
+		/* use second IMP port if GMII is enabled */
+		if (strap_value & SV_GMII_CTRL_115)
+			sw_dev->cpu_port = 5;
+	}
+
+	/* cpu port is always last */
+	sw_dev->ports = sw_dev->cpu_port + 1;
+	dev->enabled_ports |= BIT(sw_dev->cpu_port);
+
+	dev->ports = devm_kzalloc(dev->dev,
+				  sizeof(struct b53_port) * sw_dev->ports,
+				  GFP_KERNEL);
+	if (!dev->ports)
+		return -ENOMEM;
+
+	dev->vlans = devm_kzalloc(dev->dev,
+				  sizeof(struct b53_vlan) * sw_dev->vlans,
+				  GFP_KERNEL);
+	if (!dev->vlans)
+		return -ENOMEM;
+
+	dev->buf = devm_kzalloc(dev->dev, B53_BUF_SIZE, GFP_KERNEL);
+	if (!dev->buf)
+		return -ENOMEM;
+
+	dev->reset_gpio = b53_switch_get_reset_gpio(dev);
+	if (dev->reset_gpio >= 0) {
+		ret = devm_gpio_request_one(dev->dev, dev->reset_gpio, GPIOF_OUT_INIT_HIGH, "robo_reset");
+		if (ret)
+			return ret;
+	}
+
+	return b53_switch_reset(dev);
+}
+
+struct b53_device *b53_switch_alloc(struct device *base, struct b53_io_ops *ops,
+				    void *priv)
+{
+	struct b53_device *dev;
+
+	dev = devm_kzalloc(base, sizeof(*dev), GFP_KERNEL);
+	if (!dev)
+		return NULL;
+
+	dev->dev = base;
+	dev->ops = ops;
+	dev->priv = priv;
+	mutex_init(&dev->reg_mutex);
+
+	return dev;
+}
+EXPORT_SYMBOL(b53_switch_alloc);
+
+int b53_switch_detect(struct b53_device *dev)
+{
+	u32 id32;
+	u16 tmp;
+	u8 id8;
+	int ret;
+
+	ret = b53_read8(dev, B53_MGMT_PAGE, B53_DEVICE_ID, &id8);
+	if (ret)
+		return ret;
+
+	switch (id8) {
+	case 0:
+		/*
+		 * BCM5325 and BCM5365 do not have this register so reads
+		 * return 0. But the read operation did succeed, so assume
+		 * this is one of them.
+		 *
+		 * Next check if we can write to the 5325's VTA register; for
+		 * 5365 it is read only.
+		 */
+
+		b53_write16(dev, B53_VLAN_PAGE, B53_VLAN_TABLE_ACCESS_25, 0xf);
+		b53_read16(dev, B53_VLAN_PAGE, B53_VLAN_TABLE_ACCESS_25, &tmp);
+
+		if (tmp == 0xf)
+			dev->chip_id = BCM5325_DEVICE_ID;
+		else
+			dev->chip_id = BCM5365_DEVICE_ID;
+		break;
+	case BCM5395_DEVICE_ID:
+	case BCM5397_DEVICE_ID:
+	case BCM5398_DEVICE_ID:
+		dev->chip_id = id8;
+		break;
+	default:
+		ret = b53_read32(dev, B53_MGMT_PAGE, B53_DEVICE_ID, &id32);
+		if (ret)
+			return ret;
+
+		switch (id32) {
+		case BCM53115_DEVICE_ID:
+		case BCM53125_DEVICE_ID:
+			dev->chip_id = id32;
+			break;
+		default:
+			pr_err("unsupported switch detected (BCM53%02x/BCM%x)\n",
+			       id8, id32);
+			return -ENODEV;
+		}
+	}
+
+	if (dev->chip_id == BCM5325_DEVICE_ID)
+		return b53_read8(dev, B53_STAT_PAGE, B53_REV_ID_25,
+				 &dev->core_rev);
+	else
+		return b53_read8(dev, B53_MGMT_PAGE, B53_REV_ID,
+				 &dev->core_rev);
+}
+EXPORT_SYMBOL(b53_switch_detect);
+
+int b53_switch_register(struct b53_device *dev)
+{
+	int ret;
+
+	if (dev->pdata) {
+		dev->chip_id = dev->pdata->chip_id;
+		dev->enabled_ports = dev->pdata->enabled_ports;
+		dev->sw_dev.alias = dev->pdata->alias;
+	}
+
+	if (!dev->chip_id && b53_switch_detect(dev))
+		return -EINVAL;
+
+	ret = b53_switch_init(dev);
+	if (ret)
+		return ret;
+
+	pr_info("found switch: %s, rev %i\n", dev->sw_dev.name, dev->core_rev);
+
+	return register_switch(&dev->sw_dev, NULL);
+}
+EXPORT_SYMBOL(b53_switch_register);
+
+MODULE_AUTHOR("Jonas Gorski <jogo@openwrt.org>");
+MODULE_DESCRIPTION("B53 switch library");
+MODULE_LICENSE("Dual BSD/GPL");
diff --git a/drivers/net/phy/b53/b53_mdio.c b/drivers/net/phy/b53/b53_mdio.c
new file mode 100644
index 0000000..6403cc6
--- /dev/null
+++ b/drivers/net/phy/b53/b53_mdio.c
@@ -0,0 +1,425 @@
+/*
+ * B53 register access through MII registers
+ *
+ * Copyright (C) 2011-2013 Jonas Gorski <jogo@openwrt.org>
+ *
+ * Permission to use, copy, modify, and/or distribute this software for any
+ * purpose with or without fee is hereby granted, provided that the above
+ * copyright notice and this permission notice appear in all copies.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES
+ * WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF
+ * MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR
+ * ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
+ * WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN
+ * ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF
+ * OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
+ */
+
+#include <linux/kernel.h>
+#include <linux/phy.h>
+#include <linux/module.h>
+
+#include "b53_priv.h"
+
+#define B53_PSEUDO_PHY	0x1e /* Register Access Pseudo PHY */
+
+/* MII registers */
+#define REG_MII_PAGE    0x10    /* MII Page register */
+#define REG_MII_ADDR    0x11    /* MII Address register */
+#define REG_MII_DATA0   0x18    /* MII Data register 0 */
+#define REG_MII_DATA1   0x19    /* MII Data register 1 */
+#define REG_MII_DATA2   0x1a    /* MII Data register 2 */
+#define REG_MII_DATA3   0x1b    /* MII Data register 3 */
+
+#define REG_MII_PAGE_ENABLE     BIT(0)
+#define REG_MII_ADDR_WRITE      BIT(0)
+#define REG_MII_ADDR_READ       BIT(1)
+
+static int b53_mdio_op(struct b53_device *dev, u8 page, u8 reg, u16 op)
+{
+	int i;
+	u16 v;
+	int ret;
+	struct mii_bus *bus = dev->priv;
+
+	if (dev->current_page != page) {
+		/* set page number */
+		v = (page << 8) | REG_MII_PAGE_ENABLE;
+		ret = mdiobus_write(bus, B53_PSEUDO_PHY, REG_MII_PAGE, v);
+		if (ret)
+			return ret;
+		dev->current_page = page;
+	}
+
+	/* set register address */
+	v = (reg << 8) | op;
+	ret = mdiobus_write(bus, B53_PSEUDO_PHY, REG_MII_ADDR, v);
+	if (ret)
+		return ret;
+
+	/* check if operation completed */
+	for (i = 0; i < 5; ++i) {
+		v = mdiobus_read(bus, B53_PSEUDO_PHY, REG_MII_ADDR);
+		if (!(v & (REG_MII_ADDR_WRITE | REG_MII_ADDR_READ)))
+			break;
+		usleep_range(10, 100);
+	}
+
+	if (WARN_ON(i == 5))
+		return -EIO;
+
+	return 0;
+}
+
+static int b53_mdio_read8(struct b53_device *dev, u8 page, u8 reg, u8 *val)
+{
+	struct mii_bus *bus = dev->priv;
+	int ret;
+
+	ret = b53_mdio_op(dev, page, reg, REG_MII_ADDR_READ);
+	if (ret)
+		return ret;
+
+	*val = mdiobus_read(bus, B53_PSEUDO_PHY, REG_MII_DATA0) & 0xff;
+
+	return 0;
+}
+
+static int b53_mdio_read16(struct b53_device *dev, u8 page, u8 reg, u16 *val)
+{
+	struct mii_bus *bus = dev->priv;
+	int ret;
+
+	ret = b53_mdio_op(dev, page, reg, REG_MII_ADDR_READ);
+	if (ret)
+		return ret;
+
+	*val = mdiobus_read(bus, B53_PSEUDO_PHY, REG_MII_DATA0);
+
+	return 0;
+}
+
+static int b53_mdio_read32(struct b53_device *dev, u8 page, u8 reg, u32 *val)
+{
+	struct mii_bus *bus = dev->priv;
+	int ret;
+
+	ret = b53_mdio_op(dev, page, reg, REG_MII_ADDR_READ);
+	if (ret)
+		return ret;
+
+	*val = mdiobus_read(bus, B53_PSEUDO_PHY, REG_MII_DATA0);
+	*val |= mdiobus_read(bus, B53_PSEUDO_PHY, REG_MII_DATA1) << 16;
+
+	return 0;
+}
+
+static int b53_mdio_read48(struct b53_device *dev, u8 page, u8 reg, u64 *val)
+{
+	struct mii_bus *bus = dev->priv;
+	u64 temp = 0;
+	int i;
+	int ret;
+
+	ret = b53_mdio_op(dev, page, reg, REG_MII_ADDR_READ);
+	if (ret)
+		return ret;
+
+	for (i = 2; i >= 0; i--) {
+		temp <<= 16;
+		temp |= mdiobus_read(bus, B53_PSEUDO_PHY, REG_MII_DATA0 + i);
+	}
+
+	*val = temp;
+
+	return 0;
+}
+
+static int b53_mdio_read64(struct b53_device *dev, u8 page, u8 reg, u64 *val)
+{
+	struct mii_bus *bus = dev->priv;
+	u64 temp = 0;
+	int i;
+	int ret;
+
+	ret = b53_mdio_op(dev, page, reg, REG_MII_ADDR_READ);
+	if (ret)
+		return ret;
+
+	for (i = 3; i >= 0; i--) {
+		temp <<= 16;
+		temp |= mdiobus_read(bus, B53_PSEUDO_PHY, REG_MII_DATA0 + i);
+	}
+
+	*val = temp;
+
+	return 0;
+}
+
+static int b53_mdio_write8(struct b53_device *dev, u8 page, u8 reg, u8 value)
+{
+	struct mii_bus *bus = dev->priv;
+	int ret;
+
+	ret = mdiobus_write(bus, B53_PSEUDO_PHY, REG_MII_DATA0, value);
+	if (ret)
+		return ret;
+
+	return b53_mdio_op(dev, page, reg, REG_MII_ADDR_WRITE);
+}
+
+static int b53_mdio_write16(struct b53_device *dev, u8 page, u8 reg,
+			     u16 value)
+{
+	struct mii_bus *bus = dev->priv;
+	int ret;
+
+	ret = mdiobus_write(bus, B53_PSEUDO_PHY, REG_MII_DATA0, value);
+	if (ret)
+		return ret;
+
+	return b53_mdio_op(dev, page, reg, REG_MII_ADDR_WRITE);
+}
+
+static int b53_mdio_write32(struct b53_device *dev, u8 page, u8 reg,
+				    u32 value)
+{
+	struct mii_bus *bus = dev->priv;
+	unsigned int i;
+	u32 temp = value;
+
+	for (i = 0; i < 2; i++) {
+		int ret = mdiobus_write(bus, B53_PSEUDO_PHY, REG_MII_DATA0 + i,
+				    temp & 0xffff);
+		if (ret)
+			return ret;
+		temp >>= 16;
+	}
+
+	return b53_mdio_op(dev, page, reg, REG_MII_ADDR_WRITE);
+
+}
+
+static int b53_mdio_write48(struct b53_device *dev, u8 page, u8 reg,
+				    u64 value)
+{
+	struct mii_bus *bus = dev->priv;
+	unsigned i;
+	u64 temp = value;
+
+	for (i = 0; i < 3; i++) {
+		int ret = mdiobus_write(bus, B53_PSEUDO_PHY, REG_MII_DATA0 + i,
+				    temp & 0xffff);
+		if (ret)
+			return ret;
+		temp >>= 16;
+	}
+
+	return b53_mdio_op(dev, page, reg, REG_MII_ADDR_WRITE);
+
+}
+
+static int b53_mdio_write64(struct b53_device *dev, u8 page, u8 reg,
+			     u64 value)
+{
+	struct mii_bus *bus = dev->priv;
+	unsigned i;
+	u64 temp = value;
+
+	for (i = 0; i < 4; i++) {
+		int ret = mdiobus_write(bus, B53_PSEUDO_PHY, REG_MII_DATA0 + i,
+				    temp & 0xffff);
+		if (ret)
+			return ret;
+		temp >>= 16;
+	}
+
+	return b53_mdio_op(dev, page, reg, REG_MII_ADDR_WRITE);
+}
+
+static struct b53_io_ops b53_mdio_ops = {
+	.read8 = b53_mdio_read8,
+	.read16 = b53_mdio_read16,
+	.read32 = b53_mdio_read32,
+	.read48 = b53_mdio_read48,
+	.read64 = b53_mdio_read64,
+	.write8 = b53_mdio_write8,
+	.write16 = b53_mdio_write16,
+	.write32 = b53_mdio_write32,
+	.write48 = b53_mdio_write48,
+	.write64 = b53_mdio_write64,
+};
+
+static int b53_phy_probe(struct phy_device *phydev)
+{
+	struct b53_device dev;
+	int ret;
+
+	/* allow the generic phy driver to take over */
+	if (phydev->addr != B53_PSEUDO_PHY && phydev->addr != 0)
+		return -ENODEV;
+
+	dev.current_page = 0xff;
+	dev.priv = phydev->bus;
+	dev.ops = &b53_mdio_ops;
+	dev.pdata = NULL;
+	mutex_init(&dev.reg_mutex);
+
+	ret = b53_switch_detect(&dev);
+	if (ret)
+		return ret;
+
+	if (is5325(&dev) || is5365(&dev))
+		phydev->supported = SUPPORTED_100baseT_Full;
+	else
+		phydev->supported = SUPPORTED_1000baseT_Full;
+
+	phydev->advertising = phydev->supported;
+
+	return 0;
+}
+
+static int b53_phy_config_init(struct phy_device *phydev)
+{
+	struct b53_device *dev;
+	int ret;
+
+	dev = b53_switch_alloc(&phydev->dev, &b53_mdio_ops, phydev->bus);
+	if (!dev)
+		return -ENOMEM;
+
+	/* we don't use page 0xff, so force a page set */
+	dev->current_page = 0xff;
+	/* force the ethX as alias */
+	dev->sw_dev.alias = phydev->attached_dev->name;
+
+	ret = b53_switch_register(dev);
+	if (ret) {
+		pr_info("failed to register switch: %i\n", ret);
+		return ret;
+	}
+
+	phydev->priv = dev;
+
+	return 0;
+}
+
+static void b53_phy_remove(struct phy_device *phydev)
+{
+	struct b53_device *priv = phydev->priv;
+
+	if (!priv)
+		return;
+
+	b53_switch_remove(priv);
+
+	phydev->priv = NULL;
+}
+
+static int b53_phy_config_aneg(struct phy_device *phydev)
+{
+	return 0;
+}
+
+static int b53_phy_read_status(struct phy_device *phydev)
+{
+	struct b53_device *priv = phydev->priv;
+
+	if (is5325(priv) || is5365(priv))
+		phydev->speed = 100;
+	else
+		phydev->speed = 1000;
+
+	phydev->duplex = DUPLEX_FULL;
+	phydev->link = 1;
+	phydev->state = PHY_RUNNING;
+
+	netif_carrier_on(phydev->attached_dev);
+	phydev->adjust_link(phydev->attached_dev);
+
+	return 0;
+}
+
+/* BCM5325, BCM539x */
+static struct phy_driver b53_phy_driver_id1 = {
+	.phy_id		= 0x0143bc00,
+	.name		= "Broadcom B53 (1)",
+	.phy_id_mask	= 0x1ffffc00,
+	.features	= 0,
+	.probe		= b53_phy_probe,
+	.remove		= b53_phy_remove,
+	.config_aneg	= b53_phy_config_aneg,
+	.config_init	= b53_phy_config_init,
+	.read_status	= b53_phy_read_status,
+	.driver = {
+		.owner = THIS_MODULE,
+	},
+};
+
+/* BCM53125 */
+static struct phy_driver b53_phy_driver_id2 = {
+	.phy_id		= 0x03625c00,
+	.name		= "Broadcom B53 (2)",
+	.phy_id_mask	= 0x1ffffc00,
+	.features	= 0,
+	.probe		= b53_phy_probe,
+	.remove		= b53_phy_remove,
+	.config_aneg	= b53_phy_config_aneg,
+	.config_init	= b53_phy_config_init,
+	.read_status	= b53_phy_read_status,
+	.driver = {
+		.owner = THIS_MODULE,
+	},
+};
+
+/* BCM5365 */
+static struct phy_driver b53_phy_driver_id3 = {
+	.phy_id		= 0x00406000,
+	.name		= "Broadcom B53 (3)",
+	.phy_id_mask	= 0x1ffffc00,
+	.features	= 0,
+	.probe		= b53_phy_probe,
+	.remove		= b53_phy_remove,
+	.config_aneg	= b53_phy_config_aneg,
+	.config_init	= b53_phy_config_init,
+	.read_status	= b53_phy_read_status,
+	.driver = {
+		.owner = THIS_MODULE,
+	},
+};
+
+int __init b53_phy_driver_register(void)
+{
+	int ret;
+
+	ret = phy_driver_register(&b53_phy_driver_id1);
+	if (ret)
+		return ret;
+
+	ret = phy_driver_register(&b53_phy_driver_id2);
+	if (ret)
+		goto err1;
+
+	ret = phy_driver_register(&b53_phy_driver_id3);
+	if (!ret)
+		return 0;
+
+	phy_driver_unregister(&b53_phy_driver_id2);
+err1:
+	phy_driver_unregister(&b53_phy_driver_id1);
+	return ret;
+}
+
+void __exit b53_phy_driver_unregister(void)
+{
+	phy_driver_unregister(&b53_phy_driver_id3);
+	phy_driver_unregister(&b53_phy_driver_id2);
+	phy_driver_unregister(&b53_phy_driver_id1);
+}
+
+module_init(b53_phy_driver_register);
+module_exit(b53_phy_driver_unregister);
+
+MODULE_DESCRIPTION("B53 MDIO access driver");
+MODULE_LICENSE("Dual BSD/GPL");
diff --git a/drivers/net/phy/b53/b53_phy_fixup.c b/drivers/net/phy/b53/b53_phy_fixup.c
new file mode 100644
index 0000000..72d1373
--- /dev/null
+++ b/drivers/net/phy/b53/b53_phy_fixup.c
@@ -0,0 +1,55 @@
+/*
+ * B53 PHY Fixup call
+ *
+ * Copyright (C) 2013 Jonas Gorski <jogo@openwrt.org>
+ *
+ * Permission to use, copy, modify, and/or distribute this software for any
+ * purpose with or without fee is hereby granted, provided that the above
+ * copyright notice and this permission notice appear in all copies.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES
+ * WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF
+ * MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR
+ * ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
+ * WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN
+ * ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF
+ * OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
+ */
+
+#include <linux/init.h>
+#include <linux/kernel.h>
+#include <linux/phy.h>
+
+#define B53_PSEUDO_PHY	0x1e /* Register Access Pseudo PHY */
+
+#define B53_BRCM_OUI_1	0x0143bc00
+#define B53_BRCM_OUI_2	0x03625c00
+#define B53_BRCM_OUI_3	0x00406000
+
+static int b53_phy_fixup(struct phy_device *dev)
+{
+	u32 phy_id;
+	struct mii_bus *bus = dev->bus;
+
+	if (dev->addr != B53_PSEUDO_PHY)
+		return 0;
+
+	/* read the first port's id */
+	phy_id = mdiobus_read(bus, 0, 2) << 16;
+	phy_id |= mdiobus_read(bus, 0, 3);
+
+	if ((phy_id & 0xfffffc00) == B53_BRCM_OUI_1 ||
+	    (phy_id & 0xfffffc00) == B53_BRCM_OUI_2 ||
+	    (phy_id & 0xfffffc00) == B53_BRCM_OUI_3) {
+		dev->phy_id = phy_id;
+	}
+
+	return 0;
+}
+
+int __init b53_phy_fixup_register(void)
+{
+	return phy_register_fixup_for_id(PHY_ANY_ID, b53_phy_fixup);
+}
+
+subsys_initcall(b53_phy_fixup_register);
diff --git a/drivers/net/phy/b53/b53_priv.h b/drivers/net/phy/b53/b53_priv.h
new file mode 100644
index 0000000..94eaa89
--- /dev/null
+++ b/drivers/net/phy/b53/b53_priv.h
@@ -0,0 +1,282 @@
+/*
+ * B53 common definitions
+ *
+ * Copyright (C) 2011-2013 Jonas Gorski <jogo@openwrt.org>
+ *
+ * Permission to use, copy, modify, and/or distribute this software for any
+ * purpose with or without fee is hereby granted, provided that the above
+ * copyright notice and this permission notice appear in all copies.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES
+ * WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF
+ * MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR
+ * ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
+ * WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN
+ * ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF
+ * OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
+ */
+
+#ifndef __B53_PRIV_H
+#define __B53_PRIV_H
+
+#include <linux/kernel.h>
+#include <linux/mutex.h>
+#include <linux/swconfig.h>
+
+struct b53_device;
+
+struct b53_io_ops {
+	int (*read8)(struct b53_device *dev, u8 page, u8 reg, u8 *value);
+	int (*read16)(struct b53_device *dev, u8 page, u8 reg, u16 *value);
+	int (*read32)(struct b53_device *dev, u8 page, u8 reg, u32 *value);
+	int (*read48)(struct b53_device *dev, u8 page, u8 reg, u64 *value);
+	int (*read64)(struct b53_device *dev, u8 page, u8 reg, u64 *value);
+	int (*write8)(struct b53_device *dev, u8 page, u8 reg, u8 value);
+	int (*write16)(struct b53_device *dev, u8 page, u8 reg, u16 value);
+	int (*write32)(struct b53_device *dev, u8 page, u8 reg, u32 value);
+	int (*write48)(struct b53_device *dev, u8 page, u8 reg, u64 value);
+	int (*write64)(struct b53_device *dev, u8 page, u8 reg, u64 value);
+};
+
+enum {
+	BCM5325_DEVICE_ID = 0x25,
+	BCM5365_DEVICE_ID = 0x65,
+	BCM5395_DEVICE_ID = 0x95,
+	BCM5397_DEVICE_ID = 0x97,
+	BCM5398_DEVICE_ID = 0x98,
+	BCM53115_DEVICE_ID = 0x53115,
+	BCM53125_DEVICE_ID = 0x53125,
+	BCM63XX_DEVICE_ID = 0x6300,
+};
+
+#define B53_N_PORTS	9
+#define B53_N_PORTS_25	6
+
+struct b53_vlan {
+	unsigned int	members:B53_N_PORTS;
+	unsigned int	untag:B53_N_PORTS;
+};
+
+struct b53_port {
+	unsigned int	pvid:12;
+};
+
+struct b53_device {
+	struct switch_dev sw_dev;
+	struct b53_platform_data *pdata;
+
+	struct mutex reg_mutex;
+	const struct b53_io_ops *ops;
+
+	/* chip specific data */
+	u32 chip_id;
+	u8 core_rev;
+	u8 vta_regs[3];
+	u8 duplex_reg;
+	u8 jumbo_pm_reg;
+	u8 jumbo_size_reg;
+	int reset_gpio;
+
+	/* used ports mask */
+	u16 enabled_ports;
+
+	/* connect specific data */
+	u8 current_page;
+	struct device *dev;
+	void *priv;
+
+	/* run time configuration */
+	unsigned enable_vlan:1;
+	unsigned enable_jumbo:1;
+	unsigned allow_vid_4095:1;
+
+	struct b53_port *ports;
+	struct b53_vlan *vlans;
+
+	char *buf;
+};
+
+#define b53_for_each_port(dev, i) \
+	for (i = 0; i < B53_N_PORTS; i++) \
+		if (dev->enabled_ports & BIT(i))
+
+
+
+static inline int is5325(struct b53_device *dev)
+{
+	return dev->chip_id == BCM5325_DEVICE_ID;
+}
+
+static inline int is5365(struct b53_device *dev)
+{
+	return 0;
+}
+
+static inline int is5397_98(struct b53_device *dev)
+{
+	return dev->chip_id == BCM5397_DEVICE_ID ||
+		dev->chip_id == BCM5398_DEVICE_ID;
+}
+
+static inline int is539x(struct b53_device *dev)
+{
+	return dev->chip_id == BCM5395_DEVICE_ID ||
+		dev->chip_id == BCM5397_DEVICE_ID ||
+		dev->chip_id == BCM5398_DEVICE_ID;
+}
+
+static inline int is531x5(struct b53_device *dev)
+{
+	return dev->chip_id == BCM53115_DEVICE_ID ||
+		dev->chip_id == BCM53125_DEVICE_ID;
+}
+
+static inline int is63xx(struct b53_device *dev)
+{
+	return 0;
+}
+
+#define B53_CPU_PORT_25	5
+#define B53_CPU_PORT	8
+
+static inline int is_cpu_port(struct b53_device *dev, int port)
+{
+	return dev->sw_dev.cpu_port == port;
+}
+
+static inline struct b53_device *sw_to_b53(struct switch_dev *sw)
+{
+	return container_of(sw, struct b53_device, sw_dev);
+}
+
+struct b53_device *b53_switch_alloc(struct device *base, struct b53_io_ops *ops,
+				    void *priv);
+
+int b53_switch_detect(struct b53_device *dev);
+
+int b53_switch_register(struct b53_device *dev);
+
+static inline void b53_switch_remove(struct b53_device *dev)
+{
+	unregister_switch(&dev->sw_dev);
+}
+
+static inline int b53_read8(struct b53_device *dev, u8 page, u8 reg, u8 *val)
+{
+	int ret;
+
+	mutex_lock(&dev->reg_mutex);
+	ret = dev->ops->read8(dev, page, reg, val);
+	mutex_unlock(&dev->reg_mutex);
+
+	return ret;
+}
+
+static inline int b53_read16(struct b53_device *dev, u8 page, u8 reg, u16 *val)
+{
+	int ret;
+
+	mutex_lock(&dev->reg_mutex);
+	ret = dev->ops->read16(dev, page, reg, val);
+	mutex_unlock(&dev->reg_mutex);
+
+	return ret;
+}
+
+static inline int b53_read32(struct b53_device *dev, u8 page, u8 reg, u32 *val)
+{
+	int ret;
+
+	mutex_lock(&dev->reg_mutex);
+	ret = dev->ops->read32(dev, page, reg, val);
+	mutex_unlock(&dev->reg_mutex);
+
+	return ret;
+}
+
+static inline int b53_read48(struct b53_device *dev, u8 page, u8 reg, u64 *val)
+{
+	int ret;
+
+	mutex_lock(&dev->reg_mutex);
+	ret = dev->ops->read48(dev, page, reg, val);
+	mutex_unlock(&dev->reg_mutex);
+
+	return ret;
+}
+
+static inline int b53_read64(struct b53_device *dev, u8 page, u8 reg, u64 *val)
+{
+	int ret;
+
+	mutex_lock(&dev->reg_mutex);
+	ret = dev->ops->read64(dev, page, reg, val);
+	mutex_unlock(&dev->reg_mutex);
+
+	return ret;
+}
+
+static inline int b53_write8(struct b53_device *dev, u8 page, u8 reg, u8 value)
+{
+	int ret;
+
+	mutex_lock(&dev->reg_mutex);
+	ret = dev->ops->write8(dev, page, reg, value);
+	mutex_unlock(&dev->reg_mutex);
+
+	return ret;
+}
+
+static inline int b53_write16(struct b53_device *dev, u8 page, u8 reg,
+			      u16 value)
+{
+	int ret;
+
+	mutex_lock(&dev->reg_mutex);
+	ret = dev->ops->write16(dev, page, reg, value);
+	mutex_unlock(&dev->reg_mutex);
+
+	return ret;
+}
+
+static inline int b53_write32(struct b53_device *dev, u8 page, u8 reg,
+			      u32 value)
+{
+	int ret;
+
+	mutex_lock(&dev->reg_mutex);
+	ret = dev->ops->write32(dev, page, reg, value);
+	mutex_unlock(&dev->reg_mutex);
+
+	return ret;
+}
+
+static inline int b53_write48(struct b53_device *dev, u8 page, u8 reg,
+			      u64 value)
+{
+	int ret;
+
+	mutex_lock(&dev->reg_mutex);
+	ret = dev->ops->write48(dev, page, reg, value);
+	mutex_unlock(&dev->reg_mutex);
+
+	return ret;
+}
+
+static inline int b53_write64(struct b53_device *dev, u8 page, u8 reg,
+			       u64 value)
+{
+	int ret;
+
+	mutex_lock(&dev->reg_mutex);
+	ret = dev->ops->write64(dev, page, reg, value);
+	mutex_unlock(&dev->reg_mutex);
+
+	return ret;
+}
+
+static inline int b53_switch_get_reset_gpio(struct b53_device *dev)
+{
+	return -ENOENT;
+}
+#endif
diff --git a/drivers/net/phy/b53/b53_regs.h b/drivers/net/phy/b53/b53_regs.h
new file mode 100644
index 0000000..7018ff4
--- /dev/null
+++ b/drivers/net/phy/b53/b53_regs.h
@@ -0,0 +1,311 @@
+/*
+ * B53 register definitions
+ *
+ * Copyright (C) 2004 Broadcom Corporation
+ * Copyright (C) 2011-2013 Jonas Gorski <jogo@openwrt.org>
+ *
+ * Permission to use, copy, modify, and/or distribute this software for any
+ * purpose with or without fee is hereby granted, provided that the above
+ * copyright notice and this permission notice appear in all copies.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES
+ * WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF
+ * MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR
+ * ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
+ * WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN
+ * ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF
+ * OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
+ */
+
+#ifndef __B53_REGS_H
+#define __B53_REGS_H
+
+/* Management Port (SMP) Page offsets */
+#define B53_CTRL_PAGE			0x00 /* Control */
+#define B53_STAT_PAGE			0x01 /* Status */
+#define B53_MGMT_PAGE			0x02 /* Management Mode */
+#define B53_MIB_AC_PAGE			0x03 /* MIB Autocast */
+#define B53_ARLCTRL_PAGE		0x04 /* ARL Control */
+#define B53_ARLIO_PAGE			0x05 /* ARL Access */
+#define B53_FRAMEBUF_PAGE		0x06 /* Management frame access */
+#define B53_MEM_ACCESS_PAGE		0x08 /* Memory access */
+
+/* PHY Registers */
+#define B53_PORT_MII_PAGE(i)		(0x10 + i) /* Port i MII Registers */
+#define B53_IM_PORT_PAGE		0x18 /* Inverse MII Port (to EMAC) */
+#define B53_ALL_PORT_PAGE		0x19 /* All ports MII (broadcast) */
+
+/* MIB registers */
+#define B53_MIB_PAGE(i)			(0x20 + i)
+
+/* Quality of Service (QoS) Registers */
+#define B53_QOS_PAGE			0x30
+
+/* Port VLAN Page */
+#define B53_PVLAN_PAGE			0x31
+
+/* VLAN Registers */
+#define B53_VLAN_PAGE			0x34
+
+/* Jumbo Frame Registers */
+#define B53_JUMBO_PAGE			0x40
+
+/*************************************************************************
+ * Control Page registers
+ *************************************************************************/
+
+/* Port Control Register (8 bit) */
+#define B53_PORT_CTRL(i)		(0x00 + i)
+#define   PORT_CTRL_RX_DISABLE		BIT(0)
+#define   PORT_CTRL_TX_DISABLE		BIT(1)
+#define   PORT_CTRL_RX_BCST_EN		BIT(2) /* Broadcast RX (P8 only) */
+#define   PORT_CTRL_RX_MCST_EN		BIT(3) /* Multicast RX (P8 only) */
+#define   PORT_CTRL_RX_UCST_EN		BIT(4) /* Unicast RX (P8 only) */
+#define	  PORT_CTRL_STP_STATE_S		5
+#define   PORT_CTRL_STP_STATE_MASK	(0x3 << PORT_CTRL_STP_STATE_S)
+
+/* SMP Control Register (8 bit) */
+#define B53_SMP_CTRL			0x0a
+
+/* Switch Mode Control Register (8 bit) */
+#define B53_SWITCH_MODE			0x0b
+#define   SM_SW_FWD_MODE		BIT(0)	/* 1 = Managed Mode */
+#define   SM_SW_FWD_EN			BIT(1)	/* Forwarding Enable */
+
+/* IMP Port state override register (8 bit) */
+#define B53_PORT_OVERRIDE_CTRL		0x0e
+#define   PORT_OVERRIDE_LINK		BIT(0)
+#define   PORT_OVERRIDE_HALF_DUPLEX	BIT(1) /* 0 = Full Duplex */
+#define   PORT_OVERRIDE_SPEED_S		2
+#define   PORT_OVERRIDE_SPEED_10M	(0 << PORT_OVERRIDE_SPEED_S)
+#define   PORT_OVERRIDE_SPEED_100M	(1 << PORT_OVERRIDE_SPEED_S)
+#define   PORT_OVERRIDE_SPEED_1000M	(2 << PORT_OVERRIDE_SPEED_S)
+#define   PORT_OVERRIDE_RV_MII_25	BIT(4) /* BCM5325 only */
+#define   PORT_OVERRIDE_RX_FLOW		BIT(4)
+#define   PORT_OVERRIDE_TX_FLOW		BIT(5)
+#define   PORT_OVERRIDE_EN		BIT(7) /* Use the register contents */
+
+/* Power-down mode control */
+#define B53_PD_MODE_CTRL_25		0x0f
+
+/* IP Multicast control (8 bit) */
+#define B53_IP_MULTICAST_CTRL		0x21
+#define  B53_IPMC_FWD_EN		BIT(1)
+#define  B53_UC_FWD_EN			BIT(6)
+#define  B53_MC_FWD_EN			BIT(7)
+
+/* (16 bit) */
+#define B53_UC_FLOOD_MASK		0x32
+#define B53_MC_FLOOD_MASK		0x34
+#define B53_IPMC_FLOOD_MASK		0x36
+
+/* Software reset register (8 bit) */
+#define B53_SOFTRESET			0x79
+
+/* Fast Aging Control register (8 bit) */
+#define B53_FAST_AGE_CTRL		0x88
+#define   FAST_AGE_STATIC		BIT(0)
+#define   FAST_AGE_DYNAMIC		BIT(1)
+#define   FAST_AGE_PORT			BIT(2)
+#define   FAST_AGE_VLAN			BIT(3)
+#define   FAST_AGE_STP			BIT(4)
+#define   FAST_AGE_MC			BIT(5)
+#define   FAST_AGE_DONE			BIT(7)
+
+/*************************************************************************
+ * Status Page registers
+ *************************************************************************/
+
+/* Link Status Summary Register (16bit) */
+#define B53_LINK_STAT			0x00
+
+/* Link Status Change Register (16 bit) */
+#define B53_LINK_STAT_CHANGE		0x02
+
+/* Port Speed Summary Register (16 bit for FE, 32 bit for GE) */
+#define B53_SPEED_STAT			0x04
+#define  SPEED_PORT_FE(reg, port)	(((reg) >> (port)) & 1)
+#define  SPEED_PORT_GE(reg, port)	(((reg) >> 2 * (port)) & 3)
+#define  SPEED_STAT_10M			0
+#define  SPEED_STAT_100M		1
+#define  SPEED_STAT_1000M		2
+
+/* Duplex Status Summary (16 bit) */
+#define B53_DUPLEX_STAT_FE		0x06
+#define B53_DUPLEX_STAT_GE		0x08
+#define B53_DUPLEX_STAT_63XX		0x0c
+
+/* Revision ID register for BCM5325 */
+#define B53_REV_ID_25			0x50
+
+/* Strap Value (48 bit) */
+#define B53_STRAP_VALUE			0x70
+#define   SV_GMII_CTRL_115		BIT(27)
+
+/*************************************************************************
+ * Management Mode Page Registers
+ *************************************************************************/
+
+/* Global Management Config Register (8 bit) */
+#define B53_GLOBAL_CONFIG		0x00
+#define   GC_RESET_MIB			0x01
+#define   GC_RX_BPDU_EN			0x02
+#define   GC_MIB_AC_HDR_EN		0x10
+#define   GC_MIB_AC_EN			0x20
+#define   GC_FRM_MGMT_PORT_M		0xC0
+#define   GC_FRM_MGMT_PORT_04		0x00
+#define   GC_FRM_MGMT_PORT_MII		0x80
+
+/* Device ID register (8 or 32 bit) */
+#define B53_DEVICE_ID			0x30
+
+/* Revision ID register (8 bit) */
+#define B53_REV_ID			0x40
+
+/*************************************************************************
+ * ARL Access Page Registers
+ *************************************************************************/
+
+/* VLAN Table Access Register (8 bit) */
+#define B53_VT_ACCESS			0x80
+#define B53_VT_ACCESS_9798		0x60 /* for BCM5397/BCM5398 */
+#define B53_VT_ACCESS_63XX		0x60 /* for BCM6328/62/68 */
+#define   VTA_CMD_WRITE			0
+#define   VTA_CMD_READ			1
+#define   VTA_CMD_CLEAR			2
+#define   VTA_START_CMD			BIT(7)
+
+/* VLAN Table Index Register (16 bit) */
+#define B53_VT_INDEX			0x81
+#define B53_VT_INDEX_9798		0x61
+#define B53_VT_INDEX_63XX		0x62
+
+/* VLAN Table Entry Register (32 bit) */
+#define B53_VT_ENTRY			0x83
+#define B53_VT_ENTRY_9798		0x63
+#define B53_VT_ENTRY_63XX		0x64
+#define   VTE_MEMBERS			0x1ff
+#define   VTE_UNTAG_S			9
+#define   VTE_UNTAG			(0x1ff << 9)
+
+/*************************************************************************
+ * Port VLAN Registers
+ *************************************************************************/
+
+/* Port VLAN mask (16 bit) IMP port is always 8, also on 5325 & co */
+#define B53_PVLAN_PORT_MASK(i)		((i) * 2)
+
+/*************************************************************************
+ * 802.1Q Page Registers
+ *************************************************************************/
+
+/* Global QoS Control (8 bit) */
+#define B53_QOS_GLOBAL_CTL		0x00
+
+/* Enable 802.1Q for individual Ports (16 bit) */
+#define B53_802_1P_EN			0x04
+
+/*************************************************************************
+ * VLAN Page Registers
+ *************************************************************************/
+
+/* VLAN Control 0 (8 bit) */
+#define B53_VLAN_CTRL0			0x00
+#define   VC0_8021PF_CTRL_MASK		0x3
+#define   VC0_8021PF_CTRL_NONE		0x0
+#define   VC0_8021PF_CTRL_CHANGE_PRI	0x1
+#define   VC0_8021PF_CTRL_CHANGE_VID	0x2
+#define   VC0_8021PF_CTRL_CHANGE_BOTH	0x3
+#define   VC0_8021QF_CTRL_MASK		0xc
+#define   VC0_8021QF_CTRL_CHANGE_PRI	0x1
+#define   VC0_8021QF_CTRL_CHANGE_VID	0x2
+#define   VC0_8021QF_CTRL_CHANGE_BOTH	0x3
+#define   VC0_RESERVED_1		BIT(1)
+#define   VC0_DROP_VID_MISS		BIT(4)
+#define   VC0_VID_HASH_VID		BIT(5)
+#define   VC0_VID_CHK_EN		BIT(6)	/* Use VID,DA or VID,SA */
+#define   VC0_VLAN_EN			BIT(7)	/* 802.1Q VLAN Enabled */
+
+/* VLAN Control 1 (8 bit) */
+#define B53_VLAN_CTRL1			0x01
+#define   VC1_RX_MCST_TAG_EN		BIT(1)
+#define   VC1_RX_MCST_FWD_EN		BIT(2)
+#define   VC1_RX_MCST_UNTAG_EN		BIT(3)
+
+/* VLAN Control 2 (8 bit) */
+#define B53_VLAN_CTRL2			0x02
+
+/* VLAN Control 3 (8 bit when BCM5325, 16 bit else) */
+#define B53_VLAN_CTRL3			0x03
+#define B53_VLAN_CTRL3_63XX		0x04
+#define   VC3_MAXSIZE_1532		BIT(6) /* 5325 only */
+#define   VC3_HIGH_8BIT_EN		BIT(7) /* 5325 only */
+
+/* VLAN Control 4 (8 bit) */
+#define B53_VLAN_CTRL4			0x05
+#define B53_VLAN_CTRL4_25		0x04
+#define B53_VLAN_CTRL4_63XX		0x06
+#define   VC4_ING_VID_CHECK_S		6
+#define   VC4_ING_VID_CHECK_MASK	(0x3 << VC4_ING_VID_CHECK_S)
+#define   VC4_ING_VID_VIO_FWD		0 /* forward, but do not learn */
+#define   VC4_ING_VID_VIO_DROP		1 /* drop VID violations */
+#define   VC4_NO_ING_VID_CHK		2 /* do not check */
+#define   VC4_ING_VID_VIO_TO_IMP	3 /* redirect to MII port */
+
+/* VLAN Control 5 (8 bit) */
+#define B53_VLAN_CTRL5			0x06
+#define B53_VLAN_CTRL5_25		0x05
+#define B53_VLAN_CTRL5_63XX		0x07
+#define   VC5_VID_FFF_EN		BIT(2)
+#define   VC5_DROP_VTABLE_MISS		BIT(3)
+
+/* VLAN Control 6 (8 bit) */
+#define B53_VLAN_CTRL6			0x07
+#define B53_VLAN_CTRL6_63XX		0x08
+
+/* VLAN Table Access Register (16 bit) */
+#define B53_VLAN_TABLE_ACCESS_25	0x06	/* BCM5325E/5350 */
+#define B53_VLAN_TABLE_ACCESS_65	0x08	/* BCM5365 */
+#define   VTA_VID_LOW_MASK_25		0xf
+#define   VTA_VID_LOW_MASK_65		0xff
+#define   VTA_VID_HIGH_S_25		4
+#define   VTA_VID_HIGH_S_65		8
+#define   VTA_VID_HIGH_MASK_25		(0xff << VTA_VID_HIGH_S_25E)
+#define   VTA_VID_HIGH_MASK_65		(0xf << VTA_VID_HIGH_S_65)
+#define   VTA_RW_STATE			BIT(12)
+#define   VTA_RW_STATE_RD		0
+#define   VTA_RW_STATE_WR		BIT(12)
+#define   VTA_RW_OP_EN			BIT(13)
+
+/* VLAN Read/Write Registers for (16/32 bit) */
+#define B53_VLAN_WRITE_25		0x08
+#define B53_VLAN_WRITE_65		0x0a
+#define B53_VLAN_READ			0x0c
+#define   VA_MEMBER_MASK		0x3f
+#define   VA_UNTAG_S			6
+#define   VA_UNTAG_MASK			(0x3f << VA_UNTAG_S)
+#define   VA_VID_HIGH_S			12
+#define   VA_VID_HIGH_MASK		(0xffff << VA_VID_HIGH_S)
+#define   VA_VALID_25			BIT(20)
+#define   VA_VALID_25_R4		BIT(24)
+#define   VA_VALID_65			BIT(14)
+
+/* VLAN Port Default Tag (16 bit) */
+#define B53_VLAN_PORT_DEF_TAG(i)	(0x10 + 2 * (i))
+
+/*************************************************************************
+ * Jumbo Frame Page Registers
+ *************************************************************************/
+
+/* Jumbo Enable Port Mask (bit i == port i enabled) (32 bit) */
+#define B53_JUMBO_PORT_MASK		0x01
+#define B53_JUMBO_PORT_MASK_63XX	0x04
+#define   JPM_10_100_JUMBO_EN		BIT(24) /* GigE always enabled */
+
+/* Good Frame Max Size without 802.1Q TAG (16 bit) */
+#define B53_JUMBO_MAX_SIZE		0x05
+#define B53_JUMBO_MAX_SIZE_63XX		0x08
+#define   JMS_MIN_SIZE			1518
+#define   JMS_MAX_SIZE			9724
+
+#endif /* !__B53_REGS_H */
diff --git a/drivers/net/phy/b53/b53_spi.c b/drivers/net/phy/b53/b53_spi.c
new file mode 100644
index 0000000..6050fea
--- /dev/null
+++ b/drivers/net/phy/b53/b53_spi.c
@@ -0,0 +1,329 @@
+/*
+ * B53 register access through SPI
+ *
+ * Copyright (C) 2011-2013 Jonas Gorski <jogo@openwrt.org>
+ *
+ * Permission to use, copy, modify, and/or distribute this software for any
+ * purpose with or without fee is hereby granted, provided that the above
+ * copyright notice and this permission notice appear in all copies.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES
+ * WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF
+ * MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR
+ * ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
+ * WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN
+ * ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF
+ * OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
+ */
+
+#include <asm/unaligned.h>
+
+#include <linux/kernel.h>
+#include <linux/module.h>
+#include <linux/spi/spi.h>
+#include <linux/platform_data/b53.h>
+
+#include "b53_priv.h"
+
+#define B53_SPI_DATA		0xf0
+
+#define B53_SPI_STATUS		0xfe
+#define B53_SPI_CMD_SPIF	BIT(7)
+#define B53_SPI_CMD_RACK	BIT(5)
+
+#define B53_SPI_CMD_READ	0x00
+#define B53_SPI_CMD_WRITE	0x01
+#define B53_SPI_CMD_NORMAL	0x60
+#define B53_SPI_CMD_FAST	0x10
+
+#define B53_SPI_PAGE_SELECT	0xff
+
+static inline int b53_spi_read_reg(struct spi_device *spi, u8 reg, u8 *val,
+				     unsigned len)
+{
+	u8 txbuf[2];
+
+	txbuf[0] = B53_SPI_CMD_NORMAL | B53_SPI_CMD_READ;
+	txbuf[1] = reg;
+
+	return spi_write_then_read(spi, txbuf, 2, val, len);
+}
+
+static inline int b53_spi_clear_status(struct spi_device *spi)
+{
+	unsigned int i;
+	u8 rxbuf;
+	int ret;
+
+	for (i = 0; i < 10; i++) {
+		ret = b53_spi_read_reg(spi, B53_SPI_STATUS, &rxbuf, 1);
+		if (ret)
+			return ret;
+
+		if (!(rxbuf & B53_SPI_CMD_SPIF))
+			break;
+
+		mdelay(1);
+	}
+
+	if (i == 10)
+		return -EIO;
+
+	return 0;
+}
+
+static inline int b53_spi_set_page(struct spi_device *spi, u8 page)
+{
+	u8 txbuf[3];
+
+	txbuf[0] = B53_SPI_CMD_NORMAL | B53_SPI_CMD_WRITE;
+	txbuf[1] = B53_SPI_PAGE_SELECT;
+	txbuf[2] = page;
+
+	return spi_write(spi, txbuf, sizeof(txbuf));
+}
+
+static inline int b53_prepare_reg_access(struct spi_device *spi, u8 page)
+{
+	int ret = b53_spi_clear_status(spi);
+	if (ret)
+		return ret;
+
+	return b53_spi_set_page(spi, page);
+}
+
+static int b53_spi_prepare_reg_read(struct spi_device *spi, u8 reg)
+{
+	u8 rxbuf;
+	int retry_count;
+	int ret;
+
+	ret = b53_spi_read_reg(spi, reg, &rxbuf, 1);
+	if (ret)
+		return ret;
+
+	for (retry_count = 0; retry_count < 10; retry_count++) {
+		ret = b53_spi_read_reg(spi, B53_SPI_STATUS, &rxbuf, 1);
+		if (ret)
+			return ret;
+
+		if (rxbuf & B53_SPI_CMD_RACK)
+			break;
+
+		mdelay(1);
+	}
+
+	if (retry_count == 10)
+		return -EIO;
+
+	return 0;
+}
+
+static int b53_spi_read(struct b53_device *dev, u8 page, u8 reg, u8 *data,
+			unsigned len)
+{
+	struct spi_device *spi = dev->priv;
+	int ret;
+
+	ret = b53_prepare_reg_access(spi, page);
+	if (ret)
+		return ret;
+
+	ret = b53_spi_prepare_reg_read(spi, reg);
+	if (ret)
+		return ret;
+
+	return b53_spi_read_reg(spi, B53_SPI_DATA, data, len);
+}
+
+static int b53_spi_read8(struct b53_device *dev, u8 page, u8 reg, u8 *val)
+{
+	return b53_spi_read(dev, page, reg, val, 1);
+}
+
+static int b53_spi_read16(struct b53_device *dev, u8 page, u8 reg, u16 *val)
+{
+	int ret = b53_spi_read(dev, page, reg, (u8 *)val, 2);
+	if (!ret)
+		*val = le16_to_cpu(*val);
+
+	return ret;
+}
+
+static int b53_spi_read32(struct b53_device *dev, u8 page, u8 reg, u32 *val)
+{
+	int ret = b53_spi_read(dev, page, reg, (u8 *)val, 4);
+	if (!ret)
+		*val = le32_to_cpu(*val);
+
+	return ret;
+}
+
+static int b53_spi_read48(struct b53_device *dev, u8 page, u8 reg, u64 *val)
+{
+	int ret;
+
+	*val = 0;
+	ret = b53_spi_read(dev, page, reg, (u8 *)val, 6);
+	if (!ret)
+		*val = le64_to_cpu(*val);
+
+	return ret;
+}
+
+static int b53_spi_read64(struct b53_device *dev, u8 page, u8 reg, u64 *val)
+{
+	int ret = b53_spi_read(dev, page, reg, (u8 *)val, 8);
+	if (!ret)
+		*val = le64_to_cpu(*val);
+
+	return ret;
+}
+
+static int b53_spi_write8(struct b53_device *dev, u8 page, u8 reg, u8 value)
+{
+	struct spi_device *spi = dev->priv;
+	int ret;
+	u8 txbuf[3];
+
+	ret = b53_prepare_reg_access(spi, page);
+	if (ret)
+		return ret;
+
+	txbuf[0] = B53_SPI_CMD_NORMAL | B53_SPI_CMD_WRITE;
+	txbuf[1] = reg;
+	txbuf[2] = value;
+
+	return spi_write(spi, txbuf, sizeof(txbuf));
+}
+
+static int b53_spi_write16(struct b53_device *dev, u8 page, u8 reg, u16 value)
+{
+	struct spi_device *spi = dev->priv;
+	int ret;
+	u8 txbuf[4];
+
+	ret = b53_prepare_reg_access(spi, page);
+	if (ret)
+		return ret;
+
+	txbuf[0] = B53_SPI_CMD_NORMAL | B53_SPI_CMD_WRITE;
+	txbuf[1] = reg;
+	put_unaligned_le16(value, &txbuf[2]);
+
+	return spi_write(spi, txbuf, sizeof(txbuf));
+}
+
+static int b53_spi_write32(struct b53_device *dev, u8 page, u8 reg, u32 value)
+{
+	struct spi_device *spi = dev->priv;
+	int ret;
+	u8 txbuf[6];
+
+	ret = b53_prepare_reg_access(spi, page);
+	if (ret)
+		return ret;
+
+	txbuf[0] = B53_SPI_CMD_NORMAL | B53_SPI_CMD_WRITE;
+	txbuf[1] = reg;
+	put_unaligned_le32(value, &txbuf[2]);
+
+	return spi_write(spi, txbuf, sizeof(txbuf));
+}
+
+static int b53_spi_write48(struct b53_device *dev, u8 page, u8 reg, u64 value)
+{
+	struct spi_device *spi = dev->priv;
+	int ret;
+	u8 txbuf[10];
+
+	ret = b53_prepare_reg_access(spi, page);
+	if (ret)
+		return ret;
+
+	txbuf[0] = B53_SPI_CMD_NORMAL | B53_SPI_CMD_WRITE;
+	txbuf[1] = reg;
+	put_unaligned_le64(value, &txbuf[2]);
+
+	return spi_write(spi, txbuf, sizeof(txbuf) - 2);
+}
+
+static int b53_spi_write64(struct b53_device *dev, u8 page, u8 reg, u64 value)
+{
+	struct spi_device *spi = dev->priv;
+	int ret;
+	u8 txbuf[10];
+
+	ret = b53_prepare_reg_access(spi, page);
+	if (ret)
+		return ret;
+
+	txbuf[0] = B53_SPI_CMD_NORMAL | B53_SPI_CMD_WRITE;
+	txbuf[1] = reg;
+	put_unaligned_le64(value, &txbuf[2]);
+
+	return spi_write(spi, txbuf, sizeof(txbuf));
+}
+
+static struct b53_io_ops b53_spi_ops = {
+	.read8 = b53_spi_read8,
+	.read16 = b53_spi_read16,
+	.read32 = b53_spi_read32,
+	.read48 = b53_spi_read48,
+	.read64 = b53_spi_read64,
+	.write8 = b53_spi_write8,
+	.write16 = b53_spi_write16,
+	.write32 = b53_spi_write32,
+	.write48 = b53_spi_write48,
+	.write64 = b53_spi_write64,
+};
+
+static int b53_spi_probe(struct spi_device *spi)
+{
+	struct b53_device *dev;
+	int ret;
+
+	dev = b53_switch_alloc(&spi->dev, &b53_spi_ops, spi);
+	if (!dev)
+		return -ENOMEM;
+
+	if (spi->dev.platform_data)
+		dev->pdata = spi->dev.platform_data;
+
+	ret = b53_switch_register(dev);
+	if (ret)
+		return ret;
+
+	spi->dev.platform_data = dev;
+
+	return 0;
+}
+
+static int b53_spi_remove(struct spi_device *spi)
+{
+	struct b53_device *dev = spi->dev.platform_data;
+
+	if (dev) {
+		struct b53_platform_data *pdata = dev->pdata;
+		b53_switch_remove(dev);
+		spi->dev.platform_data = pdata;
+	}
+
+	return 0;
+}
+
+static struct spi_driver b53_spi_driver = {
+	.driver = {
+		.name	= "b53-switch",
+		.bus	= &spi_bus_type,
+		.owner	= THIS_MODULE,
+	},
+	.probe	= b53_spi_probe,
+	.remove	= b53_spi_remove,
+};
+
+module_spi_driver(b53_spi_driver);
+
+MODULE_AUTHOR("Jonas Gorski <jogo@openwrt.org>");
+MODULE_DESCRIPTION("B53 SPI access driver");
+MODULE_LICENSE("Dual BSD/GPL");
diff --git a/include/linux/platform_data/b53.h b/include/linux/platform_data/b53.h
new file mode 100644
index 0000000..5367b4e
--- /dev/null
+++ b/include/linux/platform_data/b53.h
@@ -0,0 +1,32 @@
+/*
+ * B53 platform data
+ *
+ * Copyright (C) 2013 Jonas Gorski <jogo@openwrt.org>
+ *
+ * Permission to use, copy, modify, and/or distribute this software for any
+ * purpose with or without fee is hereby granted, provided that the above
+ * copyright notice and this permission notice appear in all copies.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES
+ * WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF
+ * MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR
+ * ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
+ * WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN
+ * ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF
+ * OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
+ */
+
+#ifndef __B53_H
+#define __B53_H
+
+#include <linux/kernel.h>
+
+struct b53_platform_data {
+	u32 chip_id;
+	u16 enabled_ports;
+
+	/* allow to specify an ethX alias */
+	const char *alias;
+};
+
+#endif
-- 
1.8.3.2

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH 4/4 net-next] net: phy: add fake switch driver
  2013-10-22 18:23 [PATCH 0/4 net-next] net: phy: add Generic Netlink switch configuration API Florian Fainelli
                   ` (2 preceding siblings ...)
  2013-10-22 18:23 ` [PATCH 3/4 net-next] net: phy: add Broadcom B53 switch driver Florian Fainelli
@ 2013-10-22 18:23 ` Florian Fainelli
  3 siblings, 0 replies; 41+ messages in thread
From: Florian Fainelli @ 2013-10-22 18:23 UTC (permalink / raw)
  To: netdev; +Cc: davem, s.hauer, nbd, blogic, jogo, gary, Florian Fainelli

Add a fake switch driver which can be used to test both the kernel
swconfig API and user-land swconfig tool for regression and features
additions.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
---
 drivers/net/phy/Kconfig          |   8 ++
 drivers/net/phy/Makefile         |   1 +
 drivers/net/phy/swconfig-hwsim.c | 230 +++++++++++++++++++++++++++++++++++++++
 3 files changed, 239 insertions(+)
 create mode 100644 drivers/net/phy/swconfig-hwsim.c

diff --git a/drivers/net/phy/Kconfig b/drivers/net/phy/Kconfig
index d02ed5a..6bb940e 100644
--- a/drivers/net/phy/Kconfig
+++ b/drivers/net/phy/Kconfig
@@ -18,6 +18,14 @@ config SWCONFIG
 	  Switch configuration API using netlink. This allows
 	  you to configure the VLAN features of certain switches.
 
+config SWCONFIG_HWSIM
+	tristate "Fake switch driver"
+	depends on SWCONFIG
+	---help---
+	  Fake switch driver which simulates an 8 port Gigabit switch
+	  for regression testing of the swconfig kernel and user-space
+	  API.
+
 comment "MII PHY device drivers"
 
 config AT803X_PHY
diff --git a/drivers/net/phy/Makefile b/drivers/net/phy/Makefile
index 1998034..b72405a 100644
--- a/drivers/net/phy/Makefile
+++ b/drivers/net/phy/Makefile
@@ -4,6 +4,7 @@ libphy-objs			:= phy.o phy_device.o mdio_bus.o
 
 obj-$(CONFIG_PHYLIB)		+= libphy.o
 obj-$(CONFIG_SWCONFIG)		+= swconfig.o
+obj-$(CONFIG_SWCONFIG_HWSIM)	+= swconfig-hwsim.o
 obj-$(CONFIG_MARVELL_PHY)	+= marvell.o
 obj-$(CONFIG_DAVICOM_PHY)	+= davicom.o
 obj-$(CONFIG_CICADA_PHY)	+= cicada.o
diff --git a/drivers/net/phy/swconfig-hwsim.c b/drivers/net/phy/swconfig-hwsim.c
new file mode 100644
index 0000000..39cb141
--- /dev/null
+++ b/drivers/net/phy/swconfig-hwsim.c
@@ -0,0 +1,230 @@
+/*
+ * Simulation swconfig driver
+ *
+ * Copyright (C) 2013, Florian Fainelli <f.fainelli@gmail.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ */
+
+#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
+
+#include <linux/module.h>
+#include <linux/kernel.h>
+#include <linux/netdevice.h>
+#include <linux/swconfig.h>
+
+struct swconfig_hwsim_port_state {
+	unsigned int speed;
+	unsigned int link;
+	unsigned int duplex;
+};
+
+#define SWCONFIG_HWSIM_NUM_PORTS	8
+
+struct swconfig_hwsim_state {
+	struct switch_dev dev;
+	struct net_device *loopback;
+	char buf[255];
+	struct swconfig_hwsim_port_state ports[SWCONFIG_HWSIM_NUM_PORTS];
+};
+
+#define get_state(_dev) container_of((_dev), struct swconfig_hwsim_state, dev)
+
+static int swconfig_hwsim_get_name(struct switch_dev *dev,
+				   const struct switch_attr *attr,
+				   struct switch_val *val)
+{
+	struct swconfig_hwsim_state *state = get_state(dev);
+
+	val->value.s = state->dev.name;
+
+	return 0;
+}
+
+static int swconfig_hwsim_get_port_status(struct switch_dev *dev,
+					 const struct switch_attr *attr,
+					 struct switch_val *val)
+{
+	struct swconfig_hwsim_state *state = get_state(dev);
+	struct swconfig_hwsim_port_state *port_state;
+	int port = val->port_vlan;
+	char *buf = state->buf;
+	int len;
+
+	port_state = &state->ports[port];
+
+	if (!port_state->link)
+		len = snprintf(buf, sizeof(state->buf), "down");
+	else
+		len = snprintf(buf, sizeof(state->buf),
+				"up, %d Mbps, %s duplex",
+				port_state->speed,
+				port_state->duplex ? "full" : "half");
+
+	buf[len] = '\0';
+
+	val->value.s = buf;
+
+	return 0;
+}
+
+static int swconfig_hwsim_get_link(struct switch_dev *dev,
+				   const struct switch_attr *attr,
+				   struct switch_val *val)
+{
+	struct swconfig_hwsim_state *state = get_state(dev);
+	struct swconfig_hwsim_port_state *port_state;
+	int port = val->port_vlan;
+
+	port_state = &state->ports[port];
+
+	if (port_state->link)
+		val->value.i = port_state->speed;
+	else
+		val->value.i = 0;
+
+	return 0;
+}
+
+static int swconfig_hwsim_set_link(struct switch_dev *dev,
+				   const struct switch_attr *attr,
+				   struct switch_val *val)
+{
+	struct swconfig_hwsim_state *state = get_state(dev);
+	struct swconfig_hwsim_port_state *port_state;
+	int port = val->port_vlan;
+
+	/* Validate user input speed */
+	switch (val->value.i) {
+	case SWITCH_PORT_SPEED_UNKNOWN:
+	case SWITCH_PORT_SPEED_10:
+	case SWITCH_PORT_SPEED_100:
+	case SWITCH_PORT_SPEED_1000:
+		break;
+	default:
+		return -EINVAL;
+	}
+
+	port_state = &state->ports[port];
+
+	if (val->value.i == SWITCH_PORT_SPEED_UNKNOWN)
+		port_state->link = 0;
+	else {
+		port_state->link = 1;
+		port_state->speed = val->value.i;
+	}
+
+	return 0;
+}
+
+enum swconfig_hwsim_port {
+	SWCONFIG_HWSIM_PORT_STATUS,
+	SWCONFIG_HWSIM_PORT_LINK,
+};
+
+static const struct switch_attr swconfig_hwsim_attr_port[] = {
+	[SWCONFIG_HWSIM_PORT_STATUS] = {
+		.id = SWCONFIG_HWSIM_PORT_STATUS,
+		.type = SWITCH_TYPE_STRING,
+		.description = "Returns the port status",
+		.name = "status",
+		.get = swconfig_hwsim_get_port_status,
+	},
+	[SWCONFIG_HWSIM_PORT_LINK] = {
+		.id = SWCONFIG_HWSIM_PORT_LINK,
+		.type = SWITCH_TYPE_INT,
+		.description = "Sets link speed",
+		.name = "link",
+		.get = swconfig_hwsim_get_link,
+		.set = swconfig_hwsim_set_link,
+	},
+};
+
+enum swconfig_hwsim_globals {
+	SWCONFIG_HWSIM_GET_NAME,
+};
+
+static const struct switch_attr swconfig_hwsim_attr_global[] = {
+	[SWCONFIG_HWSIM_GET_NAME] = {
+		.id = SWCONFIG_HWSIM_GET_NAME,
+		.type = SWITCH_TYPE_STRING,
+		.description = "Returns the name of the switch",
+		.name = "name",
+		.get = swconfig_hwsim_get_name,
+	},
+};
+
+static int swconfig_hwsim_apply_config(struct switch_dev *dev)
+{
+	return 0;
+}
+
+static int swconfig_hwsim_reset_switch(struct switch_dev *dev)
+{
+	struct swconfig_hwsim_state *state = get_state(dev);
+	unsigned int i;
+
+	for (i = 0; i < SWCONFIG_HWSIM_NUM_PORTS; i++) {
+		state->ports[i].speed = 1000;
+		state->ports[i].link = 1;
+		state->ports[i].duplex = 1;
+	}
+
+	return 0;
+}
+
+static const struct switch_dev_ops swconfig_hwsim_ops = {
+	.attr_global = {
+		.attr = swconfig_hwsim_attr_global,
+		.n_attr = ARRAY_SIZE(swconfig_hwsim_attr_global),
+	},
+
+	.attr_port = {
+		.attr = swconfig_hwsim_attr_port,
+		.n_attr = ARRAY_SIZE(swconfig_hwsim_attr_port),
+	},
+
+	.apply_config = swconfig_hwsim_apply_config,
+	.reset_switch = swconfig_hwsim_reset_switch,
+};
+
+static struct swconfig_hwsim_state swconfig_hwsim_state = {
+	.dev = {
+		.name = "Fake switch",
+		.ops = &swconfig_hwsim_ops,
+		.ports = SWCONFIG_HWSIM_NUM_PORTS,
+		.cpu_port = SWCONFIG_HWSIM_NUM_PORTS - 1,
+	},
+};
+
+int swconfig_hwsim_init(void)
+{
+	swconfig_hwsim_state.loopback = dev_get_by_name(&init_net, "lo");
+	if (!swconfig_hwsim_state.loopback)
+		return -ENODEV;
+
+
+	return register_switch(&swconfig_hwsim_state.dev,
+				swconfig_hwsim_state.loopback);
+}
+
+void swconfig_hwsim_exit(void)
+{
+	unregister_switch(&swconfig_hwsim_state.dev);
+	dev_put(swconfig_hwsim_state.loopback);
+}
+
+module_init(swconfig_hwsim_init);
+module_exit(swconfig_hwsim_exit);
+
+MODULE_AUTHOR("Florian Fainelli <f.fainelli@gmail.com>");
+MODULE_DESCRIPTION("Fake switch driver");
+MODULE_LICENSE("Dual BSD/GPL");
-- 
1.8.3.2

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* Re: [PATCH 1/4 net-next] net: phy: add Generic Netlink Ethernet switch configuration API
  2013-10-22 18:23 ` [PATCH 1/4 net-next] net: phy: add Generic Netlink Ethernet " Florian Fainelli
@ 2013-10-22 19:22   ` Dan Williams
  2013-10-22 19:32     ` Florian Fainelli
  2013-10-22 19:46     ` David Miller
  2013-10-22 19:53   ` John Fastabend
  1 sibling, 2 replies; 41+ messages in thread
From: Dan Williams @ 2013-10-22 19:22 UTC (permalink / raw)
  To: Florian Fainelli; +Cc: netdev, davem, s.hauer, nbd, blogic, jogo, gary

On Tue, 2013-10-22 at 11:23 -0700, Florian Fainelli wrote:
> This patch adds an Ethernet Switch generic netlink configuration API
> which allows for doing the required configuration of managed Ethernet
> switches commonly found in Wireless/Cable/DSL routers in the market.

"swconfig" probably means "switch config", but is there any way to
rename this away from the "sw" prefix, since "sw" typically means
"software" and not "switch"?

Dan

> Since this API is based on the Generic Netlink infrastructure it is very
> easy to extend a particular switch driver to support additional features
> and to adapt it to specific switches.
> 
> So far the API includes support for:
> 
> - getting/setting a port VLAN id
> - getting/setting VLAN port membership
> - getting a port link status
> - getting a port statistics counters
> - resetting a switch device
> - applying a configuration to a switch device
> 
> Unlike the Distributed Switch Architecture code, this API is much
> smaller and does not interfere with the networking stack packet flow, but
> rather focuses on the control path of managed switches.
> 
> A concrete example of a switch driver is included in subsequent patches
> to illustrate how it can be used as well as the required user-space
> controlling tools.
> 
> Signed-off-by: Felix Fietkau <nbd@openwrt.org>
> Signed-off-by: John Crispin <blogic@openwrt.org>
> Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
> ---
>  Documentation/networking/swconfig.txt |  162 +++++
>  MAINTAINERS                           |   10 +
>  drivers/net/phy/Kconfig               |    6 +
>  drivers/net/phy/Makefile              |    1 +
>  drivers/net/phy/swconfig.c            | 1078 +++++++++++++++++++++++++++++++++
>  include/linux/swconfig.h              |  180 ++++++
>  include/uapi/linux/Kbuild             |    1 +
>  include/uapi/linux/swconfig.h         |  103 ++++
>  8 files changed, 1541 insertions(+)
>  create mode 100644 Documentation/networking/swconfig.txt
>  create mode 100644 drivers/net/phy/swconfig.c
>  create mode 100644 include/linux/swconfig.h
>  create mode 100644 include/uapi/linux/swconfig.h
> 
> diff --git a/Documentation/networking/swconfig.txt b/Documentation/networking/swconfig.txt
> new file mode 100644
> index 0000000..f560066
> --- /dev/null
> +++ b/Documentation/networking/swconfig.txt
> @@ -0,0 +1,162 @@
> +Generic Netlink Switch configuration API
> +
> +Introduction
> +============
> +
> +The following documentation covers the Linux Ethernet switch configuration API
> +which is based on the Generic Netlink infrastructure.
> +
> +Scope and rationale
> +===================
> +
> +Most Ethernet switches found in small routers are managed switches which allow
> +the following operations:
> +
> +- configure a port to belong to a particular set of VLANs either as tagged or
> +  untagged
> +- configure a particular port to advertise specific link/speed/duplex settings
> +- collect statistics about the number of packets/bytes transferred/received
> +- any other vendor specific feature: rate limiting, single/double tagging...
> +
> +Such switches can be connected to the controlling CPU using different hardware
> +busses, but most commonly:
> +
> +- SPI/I2C/GPIO bitbanging
> +- MDIO
> +- Memory mapped into the CPU register address space
> +
> +As of today the usual way to configure such a switch was either to write a
> +specific driver or to write an user-space application which would have to know
> +about the hardware differences and figure out a way to access the switch
> +registers (spidev, SIOCIGGMIIREG, mmap...) from user-space.
> +
> +This has multiple issues:
> +
> +- proliferation of ad-hoc solutions to configure a switch both open source and
> +  proprietary
> +
> +- absence of common software reference for switches commonly found on the market
> +  (Broadcom, Lantiq/Infineon/ADMTek, Marvell, Qualcomm/Atheros...) which implies
> +  a duplication effort for each implementer
> +
> +- inability to leverage existing hardware representation mechanisms such as
> +  Device Tree (spidev, i2c-dev.. do not belong in Device Tree and rely on
> +  Linux-specific "forwarder" drivers) to describe a switch device
> +
> +The goal of the switch configuration API is to provide a common basis to build
> +re-usable and extensible switch drivers with the following ideas in mind:
> +
> +- having a central point of configuration on top of which a reference user-space
> +  implementation can be provided but also allow for other user-space
> +  implementations to exist
> +
> +- ensure the Linux kernel is in control of the actual hardware access
> +
> +- be extensible enough to support per-switch features without making the generic
> +  implementation too heavy weighted and without making user-space changes each
> +  and every time a new feature is added
> +
> +Based on these design goals the Generic Netlink kernel/user-space communication
> +mechanism was chosen because it allows for all design goals to be met.
> +
> +Distributed Switch Architecture vs. swconfig
> +============================================
> +
> +The Marvell Distributed Switch Architecture drivers is an existing solution
> +which is a heavy switch driver infrastructure, is Marvell centric, only
> +supports MDIO connected switches, mangles an Ethernet driver transmit/receive
> +paths and does not offer a central control path for the user.
> +
> +swconfig is vendor agnostic, does not mangle the transmit/receive path
> +of an Ethernet driver and is focused on the control path of the switch rather
> +that the data path. It is based on Generic Netlink to allow for each switch
> +driver to easily extend the swconfig API without causing major core parts rework
> +each and every time someone has a specific feature to implement and offers a
> +central configuration point with a well-defined API.
> +
> +Switch configuration API
> +========================
> +
> +The main data structure of the switch configuration API is a "struct switch_dev"
> +which contains the following members:
> +
> +- a set of common operations to all switches (struct switch_dev_ops)
> +- a network device pointer it is physically attached to
> +- a number of physical switch ports (including CPU port)
> +- a set of configured vlans
> +- a CPU specific port index
> +
> +A particular switch device is registered/unregistered using the following pair
> +of functions:
> +
> +register_switch(struct switch_dev *sw_dev, struct net_device *dev);
> +unregister_switch(struct switch_dev);
> +
> +A given switch driver can be backed by any kind of underlying bus driver (i2c
> +client, GPIO driver, MMIO driver, directly into the Ethernet MAC driver...).
> +
> +The set of common operations to all switches is represented by the "struct
> +switch_dev_ops" function pointers, these common operations are defined as such:
> +
> +- get the port list of a VLAN identifier
> +- set the port list of a VLAN identifier
> +- get the primary VLAN identifier of a port
> +- set the primary VLAN identifier of a port
> +- apply the changed configuration to the switch
> +- reset the switch
> +- get a port link status
> +- get a port statistics counters
> +
> +The switch_dev_ops structure also contains an extensible way of representing and
> +querying switch specific features, 3 different types of attributes are
> +available:
> +
> +- global attributes: attributes global to a switch (name, identifier, number of
> +  ports)
> +- port attributes: per-port specific attributes (MIB counters, enabling port
> +  mirroring...)
> +- vlan attributes: per-VLAN specific attributes (VLAN id, specific VLAN
> +  information)
> +
> +Each of these 3 categories must be represented using an array of "struct
> +switch_attr" attributes. This structure must be filed with:
> +
> +- an unique name for the operation
> +- a description for the operation
> +- a setter operation
> +- a getter operation
> +- a data type (string, integer, port)
> +- eventual min/max limits to validate user input data
> +
> +The "struct switch_attr" directly maps to a Generic Netlink type of command and
> +will be automatically discovered by the "swconfig" user-space utility without
> +requiring user-space changes.
> +
> +User-space reference tool
> +=========================
> +
> +A reference user-space implementation is provided in tools/swconfig in order to
> +directly configure and use a particular switch driver. This reference
> +implementation is linking against libnl-1 for the moment.
> +
> +To build it:
> +
> +make -C tools/swconfig
> +
> +To list the available switches:
> +
> +./tools/swconfig list
> +
> +And to show a particular switch configuration for instance:
> +
> +./tools/swconfig dev eth0 show
> +
> +Fake (simulation) switch driver
> +===============================
> +
> +A fake switch driver called swconfig-hwsim is provided in order to allow for
> +easy testing of API changes and to perform regression testing. This driver will
> +automatically map to the loopback device and will create a fake switch of up to
> +8 Gigabit ports. Each of these ports can be configured with separate
> +speed/duplex/link settings. This driver is gated with the CONFIG_SWCONFIG_HWSIM
> +configuration symbol.
> diff --git a/MAINTAINERS b/MAINTAINERS
> index f169259..3a54262 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -8117,6 +8117,16 @@ F:	lib/swiotlb.c
>  F:	arch/*/kernel/pci-swiotlb.c
>  F:	include/linux/swiotlb.h
>  
> +SWITCH CONFIGURATION API
> +M:	Florian Fainelli <f.fainelli@gmail.com>
> +L:	openwrt-devel@lists.openwrt.org
> +L:	netdev@vger.kernel.org
> +S:	Supported
> +F:	drivers/net/ethernet/phy/swconfig*.c
> +F:	include/uapi/linux/switch.h
> +F:	include/linux/switch.h
> +F:	Documentation/networking/swconfig.txt
> +
>  SYNOPSYS ARC ARCHITECTURE
>  M:	Vineet Gupta <vgupta@synopsys.com>
>  S:	Supported
> diff --git a/drivers/net/phy/Kconfig b/drivers/net/phy/Kconfig
> index 342561a..9b3e117 100644
> --- a/drivers/net/phy/Kconfig
> +++ b/drivers/net/phy/Kconfig
> @@ -12,6 +12,12 @@ menuconfig PHYLIB
>  
>  if PHYLIB
>  
> +config SWCONFIG
> +	tristate "Switch configuration API"
> +	---help---
> +	  Switch configuration API using netlink. This allows
> +	  you to configure the VLAN features of certain switches.
> +
>  comment "MII PHY device drivers"
>  
>  config AT803X_PHY
> diff --git a/drivers/net/phy/Makefile b/drivers/net/phy/Makefile
> index 23a2ab2..268c7de 100644
> --- a/drivers/net/phy/Makefile
> +++ b/drivers/net/phy/Makefile
> @@ -3,6 +3,7 @@
>  libphy-objs			:= phy.o phy_device.o mdio_bus.o
>  
>  obj-$(CONFIG_PHYLIB)		+= libphy.o
> +obj-$(CONFIG_SWCONFIG)		+= swconfig.o
>  obj-$(CONFIG_MARVELL_PHY)	+= marvell.o
>  obj-$(CONFIG_DAVICOM_PHY)	+= davicom.o
>  obj-$(CONFIG_CICADA_PHY)	+= cicada.o
> diff --git a/drivers/net/phy/swconfig.c b/drivers/net/phy/swconfig.c
> new file mode 100644
> index 0000000..9997c35
> --- /dev/null
> +++ b/drivers/net/phy/swconfig.c
> @@ -0,0 +1,1078 @@
> +/*
> + * Switch configuration API
> + *
> + * Copyright (C) 2008 Felix Fietkau <nbd@openwrt.org>
> + *
> + * This program is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU General Public License
> + * as published by the Free Software Foundation; either version 2
> + * of the License, or (at your option) any later version.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + */
> +
> +#include <linux/types.h>
> +#include <linux/module.h>
> +#include <linux/init.h>
> +#include <linux/list.h>
> +#include <linux/if.h>
> +#include <linux/if_ether.h>
> +#include <linux/capability.h>
> +#include <linux/skbuff.h>
> +#include <linux/swconfig.h>
> +
> +#define SWCONFIG_DEVNAME	"switch%d"
> +
> +MODULE_AUTHOR("Felix Fietkau <nbd@openwrt.org>");
> +MODULE_LICENSE("GPL");
> +
> +static int swdev_id;
> +static struct list_head swdevs;
> +static DEFINE_SPINLOCK(swdevs_lock);
> +struct swconfig_callback;
> +
> +struct swconfig_callback {
> +	struct sk_buff *msg;
> +	struct genlmsghdr *hdr;
> +	struct genl_info *info;
> +	int cmd;
> +
> +	/* callback for filling in the message data */
> +	int (*fill)(struct swconfig_callback *cb, void *arg);
> +
> +	/* callback for closing the message before sending it */
> +	int (*close)(struct swconfig_callback *cb, void *arg);
> +
> +	struct nlattr *nest[4];
> +	int args[4];
> +};
> +
> +/* defaults */
> +
> +static int
> +swconfig_get_vlan_ports(struct switch_dev *dev,
> +			const struct switch_attr *attr, struct switch_val *val)
> +{
> +	int ret;
> +	if (val->port_vlan >= dev->vlans)
> +		return -EINVAL;
> +
> +	if (!dev->ops->get_vlan_ports)
> +		return -EOPNOTSUPP;
> +
> +	ret = dev->ops->get_vlan_ports(dev, val);
> +	return ret;
> +}
> +
> +static int
> +swconfig_set_vlan_ports(struct switch_dev *dev,
> +			const struct switch_attr *attr, struct switch_val *val)
> +{
> +	struct switch_port *ports = val->value.ports;
> +	const struct switch_dev_ops *ops = dev->ops;
> +	int i;
> +
> +	if (val->port_vlan >= dev->vlans)
> +		return -EINVAL;
> +
> +	/* validate ports */
> +	if (val->len > dev->ports)
> +		return -EINVAL;
> +
> +	if (!ops->set_vlan_ports)
> +		return -EOPNOTSUPP;
> +
> +	for (i = 0; i < val->len; i++) {
> +		if (ports[i].id >= dev->ports)
> +			return -EINVAL;
> +
> +		if (ops->set_port_pvid &&
> +		    !(ports[i].flags & (1 << SWITCH_PORT_FLAG_TAGGED)))
> +			ops->set_port_pvid(dev, ports[i].id, val->port_vlan);
> +	}
> +
> +	return ops->set_vlan_ports(dev, val);
> +}
> +
> +static int
> +swconfig_set_pvid(struct switch_dev *dev,
> +			const struct switch_attr *attr, struct switch_val *val)
> +{
> +	if (val->port_vlan >= dev->ports)
> +		return -EINVAL;
> +
> +	if (!dev->ops->set_port_pvid)
> +		return -EOPNOTSUPP;
> +
> +	return dev->ops->set_port_pvid(dev, val->port_vlan, val->value.i);
> +}
> +
> +static int
> +swconfig_get_pvid(struct switch_dev *dev,
> +			const struct switch_attr *attr, struct switch_val *val)
> +{
> +	if (val->port_vlan >= dev->ports)
> +		return -EINVAL;
> +
> +	if (!dev->ops->get_port_pvid)
> +		return -EOPNOTSUPP;
> +
> +	return dev->ops->get_port_pvid(dev, val->port_vlan, &val->value.i);
> +}
> +
> +static const char *
> +swconfig_speed_str(enum switch_port_speed speed)
> +{
> +	switch (speed) {
> +	case SWITCH_PORT_SPEED_10:
> +		return "10baseT";
> +	case SWITCH_PORT_SPEED_100:
> +		return "100baseT";
> +	case SWITCH_PORT_SPEED_1000:
> +		return "1000baseT";
> +	default:
> +		break;
> +	}
> +
> +	return "unknown";
> +}
> +
> +static int
> +swconfig_get_link(struct switch_dev *dev,
> +			const struct switch_attr *attr, struct switch_val *val)
> +{
> +	struct switch_port_link link;
> +	int len;
> +	int ret;
> +
> +	if (val->port_vlan >= dev->ports)
> +		return -EINVAL;
> +
> +	if (!dev->ops->get_port_link)
> +		return -EOPNOTSUPP;
> +
> +	memset(&link, 0, sizeof(link));
> +	ret = dev->ops->get_port_link(dev, val->port_vlan, &link);
> +	if (ret)
> +		return ret;
> +
> +	memset(dev->buf, 0, sizeof(dev->buf));
> +
> +	if (link.link)
> +		len = snprintf(dev->buf, sizeof(dev->buf),
> +			       "port:%d link:up speed:%s %s-duplex %s%s%s",
> +			       val->port_vlan,
> +			       swconfig_speed_str(link.speed),
> +			       link.duplex ? "full" : "half",
> +			       link.tx_flow ? "txflow " : "",
> +			       link.rx_flow ?	"rxflow " : "",
> +			       link.aneg ? "auto" : "");
> +	else
> +		len = snprintf(dev->buf, sizeof(dev->buf), "port:%d link:down",
> +			       val->port_vlan);
> +
> +	val->value.s = dev->buf;
> +	val->len = len;
> +
> +	return 0;
> +}
> +
> +static int
> +swconfig_apply_config(struct switch_dev *dev,
> +			const struct switch_attr *attr, struct switch_val *val)
> +{
> +	/* don't complain if not supported by the switch driver */
> +	if (!dev->ops->apply_config)
> +		return 0;
> +
> +	return dev->ops->apply_config(dev);
> +}
> +
> +static int
> +swconfig_reset_switch(struct switch_dev *dev,
> +			const struct switch_attr *attr, struct switch_val *val)
> +{
> +	/* don't complain if not supported by the switch driver */
> +	if (!dev->ops->reset_switch)
> +		return 0;
> +
> +	return dev->ops->reset_switch(dev);
> +}
> +
> +enum global_defaults {
> +	GLOBAL_APPLY,
> +	GLOBAL_RESET,
> +};
> +
> +enum vlan_defaults {
> +	VLAN_PORTS,
> +};
> +
> +enum port_defaults {
> +	PORT_PVID,
> +	PORT_LINK,
> +};
> +
> +static struct switch_attr default_global[] = {
> +	[GLOBAL_APPLY] = {
> +		.type = SWITCH_TYPE_NOVAL,
> +		.name = "apply",
> +		.description = "Activate changes in the hardware",
> +		.set = swconfig_apply_config,
> +	},
> +	[GLOBAL_RESET] = {
> +		.type = SWITCH_TYPE_NOVAL,
> +		.name = "reset",
> +		.description = "Reset the switch",
> +		.set = swconfig_reset_switch,
> +	}
> +};
> +
> +static struct switch_attr default_port[] = {
> +	[PORT_PVID] = {
> +		.type = SWITCH_TYPE_INT,
> +		.name = "pvid",
> +		.description = "Primary VLAN ID",
> +		.set = swconfig_set_pvid,
> +		.get = swconfig_get_pvid,
> +	},
> +	[PORT_LINK] = {
> +		.type = SWITCH_TYPE_STRING,
> +		.name = "link",
> +		.description = "Get port link information",
> +		.set = NULL,
> +		.get = swconfig_get_link,
> +	}
> +};
> +
> +static struct switch_attr default_vlan[] = {
> +	[VLAN_PORTS] = {
> +		.type = SWITCH_TYPE_PORTS,
> +		.name = "ports",
> +		.description = "VLAN port mapping",
> +		.set = swconfig_set_vlan_ports,
> +		.get = swconfig_get_vlan_ports,
> +	},
> +};
> +
> +static const struct switch_attr *
> +swconfig_find_attr_by_name(const struct switch_attrlist *alist,
> +				const char *name)
> +{
> +	int i;
> +
> +	for (i = 0; i < alist->n_attr; i++)
> +		if (strcmp(name, alist->attr[i].name) == 0)
> +			return &alist->attr[i];
> +
> +	return NULL;
> +}
> +
> +static void swconfig_defaults_init(struct switch_dev *dev)
> +{
> +	const struct switch_dev_ops *ops = dev->ops;
> +
> +	dev->def_global = 0;
> +	dev->def_vlan = 0;
> +	dev->def_port = 0;
> +
> +	if (ops->get_vlan_ports || ops->set_vlan_ports)
> +		set_bit(VLAN_PORTS, &dev->def_vlan);
> +
> +	if (ops->get_port_pvid || ops->set_port_pvid)
> +		set_bit(PORT_PVID, &dev->def_port);
> +
> +	if (ops->get_port_link &&
> +	    !swconfig_find_attr_by_name(&ops->attr_port, "link"))
> +		set_bit(PORT_LINK, &dev->def_port);
> +
> +	/* always present, can be no-op */
> +	set_bit(GLOBAL_APPLY, &dev->def_global);
> +	set_bit(GLOBAL_RESET, &dev->def_global);
> +}
> +
> +
> +static struct genl_family switch_fam = {
> +	.id = GENL_ID_GENERATE,
> +	.name = "switch",
> +	.hdrsize = 0,
> +	.version = 1,
> +	.maxattr = SWITCH_ATTR_MAX,
> +};
> +
> +static const struct nla_policy switch_policy[SWITCH_ATTR_MAX+1] = {
> +	[SWITCH_ATTR_ID] = { .type = NLA_U32 },
> +	[SWITCH_ATTR_OP_ID] = { .type = NLA_U32 },
> +	[SWITCH_ATTR_OP_PORT] = { .type = NLA_U32 },
> +	[SWITCH_ATTR_OP_VLAN] = { .type = NLA_U32 },
> +	[SWITCH_ATTR_OP_VALUE_INT] = { .type = NLA_U32 },
> +	[SWITCH_ATTR_OP_VALUE_STR] = { .type = NLA_NUL_STRING },
> +	[SWITCH_ATTR_OP_VALUE_PORTS] = { .type = NLA_NESTED },
> +	[SWITCH_ATTR_TYPE] = { .type = NLA_U32 },
> +};
> +
> +static const struct nla_policy port_policy[SWITCH_PORT_ATTR_MAX+1] = {
> +	[SWITCH_PORT_ID] = { .type = NLA_U32 },
> +	[SWITCH_PORT_FLAG_TAGGED] = { .type = NLA_FLAG },
> +};
> +
> +static inline void
> +swconfig_lock(void)
> +{
> +	spin_lock(&swdevs_lock);
> +}
> +
> +static inline void
> +swconfig_unlock(void)
> +{
> +	spin_unlock(&swdevs_lock);
> +}
> +
> +static struct switch_dev *
> +swconfig_get_dev(struct genl_info *info)
> +{
> +	struct switch_dev *dev = NULL;
> +	struct switch_dev *p;
> +	int id;
> +
> +	if (!info->attrs[SWITCH_ATTR_ID])
> +		goto done;
> +
> +	id = nla_get_u32(info->attrs[SWITCH_ATTR_ID]);
> +	swconfig_lock();
> +	list_for_each_entry(p, &swdevs, dev_list) {
> +		if (id != p->id)
> +			continue;
> +
> +		dev = p;
> +		break;
> +	}
> +	if (dev)
> +		mutex_lock(&dev->sw_mutex);
> +	else
> +		pr_debug("device %d not found\n", id);
> +	swconfig_unlock();
> +done:
> +	return dev;
> +}
> +
> +static inline void
> +swconfig_put_dev(struct switch_dev *dev)
> +{
> +	mutex_unlock(&dev->sw_mutex);
> +}
> +
> +static int
> +swconfig_dump_attr(struct swconfig_callback *cb, void *arg)
> +{
> +	struct switch_attr *op = arg;
> +	struct genl_info *info = cb->info;
> +	struct sk_buff *msg = cb->msg;
> +	int id = cb->args[0];
> +	void *hdr;
> +
> +	hdr = genlmsg_put(msg, info->snd_portid, info->snd_seq, &switch_fam,
> +			NLM_F_MULTI, SWITCH_CMD_NEW_ATTR);
> +	if (IS_ERR(hdr))
> +		return -1;
> +
> +	if (nla_put_u32(msg, SWITCH_ATTR_OP_ID, id))
> +		goto nla_put_failure;
> +	if (nla_put_u32(msg, SWITCH_ATTR_OP_TYPE, op->type))
> +		goto nla_put_failure;
> +	if (nla_put_string(msg, SWITCH_ATTR_OP_NAME, op->name))
> +		goto nla_put_failure;
> +	if (op->description)
> +		if (nla_put_string(msg, SWITCH_ATTR_OP_DESCRIPTION,
> +			op->description))
> +			goto nla_put_failure;
> +
> +	return genlmsg_end(msg, hdr);
> +nla_put_failure:
> +	genlmsg_cancel(msg, hdr);
> +	return -EMSGSIZE;
> +}
> +
> +/* spread multipart messages across multiple message buffers */
> +static int
> +swconfig_send_multipart(struct swconfig_callback *cb, void *arg)
> +{
> +	struct genl_info *info = cb->info;
> +	int restart = 0;
> +	int err;
> +
> +	do {
> +		if (!cb->msg) {
> +			cb->msg = nlmsg_new(NLMSG_GOODSIZE, GFP_KERNEL);
> +			if (cb->msg == NULL)
> +				goto error;
> +		}
> +
> +		if (!(cb->fill(cb, arg) < 0))
> +			break;
> +
> +		/* fill failed, check if this was already the second attempt */
> +		if (restart)
> +			goto error;
> +
> +		/* try again in a new message, send the current one */
> +		restart = 1;
> +		if (cb->close) {
> +			if (cb->close(cb, arg) < 0)
> +				goto error;
> +		}
> +		err = genlmsg_reply(cb->msg, info);
> +		cb->msg = NULL;
> +		if (err < 0)
> +			goto error;
> +
> +	} while (restart);
> +
> +	return 0;
> +
> +error:
> +	if (cb->msg)
> +		nlmsg_free(cb->msg);
> +	return -1;
> +}
> +
> +static int
> +swconfig_list_attrs(struct sk_buff *skb, struct genl_info *info)
> +{
> +	struct genlmsghdr *hdr = nlmsg_data(info->nlhdr);
> +	const struct switch_attrlist *alist;
> +	struct switch_dev *dev;
> +	struct swconfig_callback cb;
> +	int err = -EINVAL;
> +	int i;
> +
> +	/* defaults */
> +	struct switch_attr *def_list;
> +	unsigned long *def_active;
> +	int n_def;
> +
> +	dev = swconfig_get_dev(info);
> +	if (!dev)
> +		return -EINVAL;
> +
> +	switch (hdr->cmd) {
> +	case SWITCH_CMD_LIST_GLOBAL:
> +		alist = &dev->ops->attr_global;
> +		def_list = default_global;
> +		def_active = &dev->def_global;
> +		n_def = ARRAY_SIZE(default_global);
> +		break;
> +	case SWITCH_CMD_LIST_VLAN:
> +		alist = &dev->ops->attr_vlan;
> +		def_list = default_vlan;
> +		def_active = &dev->def_vlan;
> +		n_def = ARRAY_SIZE(default_vlan);
> +		break;
> +	case SWITCH_CMD_LIST_PORT:
> +		alist = &dev->ops->attr_port;
> +		def_list = default_port;
> +		def_active = &dev->def_port;
> +		n_def = ARRAY_SIZE(default_port);
> +		break;
> +	default:
> +		WARN_ON(1);
> +		goto out;
> +	}
> +
> +	memset(&cb, 0, sizeof(cb));
> +	cb.info = info;
> +	cb.fill = swconfig_dump_attr;
> +	for (i = 0; i < alist->n_attr; i++) {
> +		if (alist->attr[i].disabled)
> +			continue;
> +		cb.args[0] = i;
> +		err = swconfig_send_multipart(&cb, (void *) &alist->attr[i]);
> +		if (err < 0)
> +			goto error;
> +	}
> +
> +	/* defaults */
> +	for (i = 0; i < n_def; i++) {
> +		if (!test_bit(i, def_active))
> +			continue;
> +		cb.args[0] = SWITCH_ATTR_DEFAULTS_OFFSET + i;
> +		err = swconfig_send_multipart(&cb, (void *) &def_list[i]);
> +		if (err < 0)
> +			goto error;
> +	}
> +	swconfig_put_dev(dev);
> +
> +	if (!cb.msg)
> +		return 0;
> +
> +	return genlmsg_reply(cb.msg, info);
> +
> +error:
> +	if (cb.msg)
> +		nlmsg_free(cb.msg);
> +out:
> +	swconfig_put_dev(dev);
> +	return err;
> +}
> +
> +static const struct switch_attr *
> +swconfig_lookup_attr(struct switch_dev *dev, struct genl_info *info,
> +		struct switch_val *val)
> +{
> +	struct genlmsghdr *hdr = nlmsg_data(info->nlhdr);
> +	const struct switch_attrlist *alist;
> +	const struct switch_attr *attr = NULL;
> +	int attr_id;
> +
> +	/* defaults */
> +	struct switch_attr *def_list;
> +	unsigned long *def_active;
> +	int n_def;
> +
> +	if (!info->attrs[SWITCH_ATTR_OP_ID])
> +		goto done;
> +
> +	switch (hdr->cmd) {
> +	case SWITCH_CMD_SET_GLOBAL:
> +	case SWITCH_CMD_GET_GLOBAL:
> +		alist = &dev->ops->attr_global;
> +		def_list = default_global;
> +		def_active = &dev->def_global;
> +		n_def = ARRAY_SIZE(default_global);
> +		break;
> +	case SWITCH_CMD_SET_VLAN:
> +	case SWITCH_CMD_GET_VLAN:
> +		alist = &dev->ops->attr_vlan;
> +		def_list = default_vlan;
> +		def_active = &dev->def_vlan;
> +		n_def = ARRAY_SIZE(default_vlan);
> +		if (!info->attrs[SWITCH_ATTR_OP_VLAN])
> +			goto done;
> +		val->port_vlan = nla_get_u32(info->attrs[SWITCH_ATTR_OP_VLAN]);
> +		if (val->port_vlan >= dev->vlans)
> +			goto done;
> +		break;
> +	case SWITCH_CMD_SET_PORT:
> +	case SWITCH_CMD_GET_PORT:
> +		alist = &dev->ops->attr_port;
> +		def_list = default_port;
> +		def_active = &dev->def_port;
> +		n_def = ARRAY_SIZE(default_port);
> +		if (!info->attrs[SWITCH_ATTR_OP_PORT])
> +			goto done;
> +		val->port_vlan = nla_get_u32(info->attrs[SWITCH_ATTR_OP_PORT]);
> +		if (val->port_vlan >= dev->ports)
> +			goto done;
> +		break;
> +	default:
> +		WARN_ON(1);
> +		goto done;
> +	}
> +
> +	if (!alist)
> +		goto done;
> +
> +	attr_id = nla_get_u32(info->attrs[SWITCH_ATTR_OP_ID]);
> +	if (attr_id >= SWITCH_ATTR_DEFAULTS_OFFSET) {
> +		attr_id -= SWITCH_ATTR_DEFAULTS_OFFSET;
> +		if (attr_id >= n_def)
> +			goto done;
> +		if (!test_bit(attr_id, def_active))
> +			goto done;
> +		attr = &def_list[attr_id];
> +	} else {
> +		if (attr_id >= alist->n_attr)
> +			goto done;
> +		attr = &alist->attr[attr_id];
> +	}
> +
> +	if (attr->disabled)
> +		attr = NULL;
> +
> +done:
> +	if (!attr)
> +		pr_debug("attribute lookup failed\n");
> +	val->attr = attr;
> +	return attr;
> +}
> +
> +static int
> +swconfig_parse_ports(struct sk_buff *msg, struct nlattr *head,
> +		struct switch_val *val, int max)
> +{
> +	struct nlattr *nla;
> +	int rem;
> +
> +	val->len = 0;
> +	nla_for_each_nested(nla, head, rem) {
> +		struct nlattr *tb[SWITCH_PORT_ATTR_MAX+1];
> +		struct switch_port *port = &val->value.ports[val->len];
> +
> +		if (val->len >= max)
> +			return -EINVAL;
> +
> +		if (nla_parse_nested(tb, SWITCH_PORT_ATTR_MAX, nla,
> +				port_policy))
> +			return -EINVAL;
> +
> +		if (!tb[SWITCH_PORT_ID])
> +			return -EINVAL;
> +
> +		port->id = nla_get_u32(tb[SWITCH_PORT_ID]);
> +		if (tb[SWITCH_PORT_FLAG_TAGGED])
> +			port->flags |= (1 << SWITCH_PORT_FLAG_TAGGED);
> +		val->len++;
> +	}
> +
> +	return 0;
> +}
> +
> +static int
> +swconfig_set_attr(struct sk_buff *skb, struct genl_info *info)
> +{
> +	const struct switch_attr *attr;
> +	struct switch_dev *dev;
> +	struct switch_val val;
> +	int err = -EINVAL;
> +
> +	dev = swconfig_get_dev(info);
> +	if (!dev)
> +		return -EINVAL;
> +
> +	memset(&val, 0, sizeof(val));
> +	attr = swconfig_lookup_attr(dev, info, &val);
> +	if (!attr || !attr->set)
> +		goto error;
> +
> +	val.attr = attr;
> +	switch (attr->type) {
> +	case SWITCH_TYPE_NOVAL:
> +		break;
> +	case SWITCH_TYPE_INT:
> +		if (!info->attrs[SWITCH_ATTR_OP_VALUE_INT])
> +			goto error;
> +		val.value.i =
> +			nla_get_u32(info->attrs[SWITCH_ATTR_OP_VALUE_INT]);
> +		break;
> +	case SWITCH_TYPE_STRING:
> +		if (!info->attrs[SWITCH_ATTR_OP_VALUE_STR])
> +			goto error;
> +		val.value.s =
> +			nla_data(info->attrs[SWITCH_ATTR_OP_VALUE_STR]);
> +		break;
> +	case SWITCH_TYPE_PORTS:
> +		val.value.ports = dev->portbuf;
> +		memset(dev->portbuf, 0,
> +			sizeof(struct switch_port) * dev->ports);
> +
> +		/* TODO: implement multipart? */
> +		if (info->attrs[SWITCH_ATTR_OP_VALUE_PORTS]) {
> +			err = swconfig_parse_ports(skb,
> +				info->attrs[SWITCH_ATTR_OP_VALUE_PORTS],
> +				&val, dev->ports);
> +			if (err < 0)
> +				goto error;
> +		} else {
> +			val.len = 0;
> +			err = 0;
> +		}
> +		break;
> +	default:
> +		goto error;
> +	}
> +
> +	err = attr->set(dev, attr, &val);
> +error:
> +	swconfig_put_dev(dev);
> +	return err;
> +}
> +
> +static int
> +swconfig_close_portlist(struct swconfig_callback *cb, void *arg)
> +{
> +	if (cb->nest[0])
> +		nla_nest_end(cb->msg, cb->nest[0]);
> +	return 0;
> +}
> +
> +static int
> +swconfig_send_port(struct swconfig_callback *cb, void *arg)
> +{
> +	const struct switch_port *port = arg;
> +	struct nlattr *p = NULL;
> +
> +	if (!cb->nest[0]) {
> +		cb->nest[0] = nla_nest_start(cb->msg, cb->cmd);
> +		if (!cb->nest[0])
> +			return -1;
> +	}
> +
> +	p = nla_nest_start(cb->msg, SWITCH_ATTR_PORT);
> +	if (!p)
> +		goto error;
> +
> +	if (nla_put_u32(cb->msg, SWITCH_PORT_ID, port->id))
> +		goto nla_put_failure;
> +	if (port->flags & (1 << SWITCH_PORT_FLAG_TAGGED)) {
> +		if (nla_put_flag(cb->msg, SWITCH_PORT_FLAG_TAGGED))
> +			goto nla_put_failure;
> +	}
> +
> +	nla_nest_end(cb->msg, p);
> +	return 0;
> +
> +nla_put_failure:
> +		nla_nest_cancel(cb->msg, p);
> +error:
> +	nla_nest_cancel(cb->msg, cb->nest[0]);
> +	return -1;
> +}
> +
> +static int
> +swconfig_send_ports(struct sk_buff **msg, struct genl_info *info, int attr,
> +		const struct switch_val *val)
> +{
> +	struct swconfig_callback cb;
> +	int err = 0;
> +	int i;
> +
> +	if (!val->value.ports)
> +		return -EINVAL;
> +
> +	memset(&cb, 0, sizeof(cb));
> +	cb.cmd = attr;
> +	cb.msg = *msg;
> +	cb.info = info;
> +	cb.fill = swconfig_send_port;
> +	cb.close = swconfig_close_portlist;
> +
> +	cb.nest[0] = nla_nest_start(cb.msg, cb.cmd);
> +	for (i = 0; i < val->len; i++) {
> +		err = swconfig_send_multipart(&cb, &val->value.ports[i]);
> +		if (err)
> +			goto done;
> +	}
> +	err = val->len;
> +	swconfig_close_portlist(&cb, NULL);
> +	*msg = cb.msg;
> +
> +done:
> +	return err;
> +}
> +
> +static int
> +swconfig_get_attr(struct sk_buff *skb, struct genl_info *info)
> +{
> +	struct genlmsghdr *hdr = nlmsg_data(info->nlhdr);
> +	const struct switch_attr *attr;
> +	struct switch_dev *dev;
> +	struct sk_buff *msg = NULL;
> +	struct switch_val val;
> +	int err = -EINVAL;
> +	int cmd = hdr->cmd;
> +
> +	dev = swconfig_get_dev(info);
> +	if (!dev)
> +		return -EINVAL;
> +
> +	memset(&val, 0, sizeof(val));
> +	attr = swconfig_lookup_attr(dev, info, &val);
> +	if (!attr || !attr->get)
> +		goto error;
> +
> +	if (attr->type == SWITCH_TYPE_PORTS) {
> +		val.value.ports = dev->portbuf;
> +		memset(dev->portbuf, 0,
> +			sizeof(struct switch_port) * dev->ports);
> +	}
> +
> +	err = attr->get(dev, attr, &val);
> +	if (err)
> +		goto error;
> +
> +	msg = nlmsg_new(NLMSG_GOODSIZE, GFP_KERNEL);
> +	if (!msg)
> +		goto error;
> +
> +	hdr = genlmsg_put(msg, info->snd_portid, info->snd_seq, &switch_fam,
> +			0, cmd);
> +	if (IS_ERR(hdr))
> +		goto nla_put_failure;
> +
> +	switch (attr->type) {
> +	case SWITCH_TYPE_INT:
> +		if (nla_put_u32(msg, SWITCH_ATTR_OP_VALUE_INT, val.value.i))
> +			goto nla_put_failure;
> +		break;
> +	case SWITCH_TYPE_STRING:
> +		if (nla_put_string(msg, SWITCH_ATTR_OP_VALUE_STR, val.value.s))
> +			goto nla_put_failure;
> +		break;
> +	case SWITCH_TYPE_PORTS:
> +		err = swconfig_send_ports(&msg, info,
> +				SWITCH_ATTR_OP_VALUE_PORTS, &val);
> +		if (err < 0)
> +			goto nla_put_failure;
> +		break;
> +	default:
> +		pr_debug("invalid type in attribute\n");
> +		err = -EINVAL;
> +		goto error;
> +	}
> +	err = genlmsg_end(msg, hdr);
> +	if (err < 0)
> +		goto nla_put_failure;
> +
> +	swconfig_put_dev(dev);
> +	return genlmsg_reply(msg, info);
> +
> +nla_put_failure:
> +	if (msg)
> +		nlmsg_free(msg);
> +error:
> +	swconfig_put_dev(dev);
> +	if (!err)
> +		err = -ENOMEM;
> +	return err;
> +}
> +
> +static int
> +swconfig_send_switch(struct sk_buff *msg, u32 pid, u32 seq, int flags,
> +		const struct switch_dev *dev)
> +{
> +	struct nlattr *p = NULL, *m = NULL;
> +	void *hdr;
> +	int i;
> +
> +	hdr = genlmsg_put(msg, pid, seq, &switch_fam, flags,
> +			SWITCH_CMD_NEW_ATTR);
> +	if (IS_ERR(hdr))
> +		return -1;
> +
> +	if (nla_put_u32(msg, SWITCH_ATTR_ID, dev->id))
> +		goto nla_put_failure;
> +	if (nla_put_string(msg, SWITCH_ATTR_DEV_NAME, dev->devname))
> +		goto nla_put_failure;
> +	if (nla_put_string(msg, SWITCH_ATTR_ALIAS, dev->alias))
> +		goto nla_put_failure;
> +	if (nla_put_string(msg, SWITCH_ATTR_NAME, dev->name))
> +		goto nla_put_failure;
> +	if (nla_put_u32(msg, SWITCH_ATTR_VLANS, dev->vlans))
> +		goto nla_put_failure;
> +	if (nla_put_u32(msg, SWITCH_ATTR_PORTS, dev->ports))
> +		goto nla_put_failure;
> +	if (nla_put_u32(msg, SWITCH_ATTR_CPU_PORT, dev->cpu_port))
> +		goto nla_put_failure;
> +
> +	m = nla_nest_start(msg, SWITCH_ATTR_PORTMAP);
> +	if (!m)
> +		goto nla_put_failure;
> +	for (i = 0; i < dev->ports; i++) {
> +		p = nla_nest_start(msg, SWITCH_ATTR_PORTS);
> +		if (!p)
> +			continue;
> +		if (dev->portmap[i].s) {
> +			if (nla_put_string(msg, SWITCH_PORTMAP_SEGMENT,
> +						dev->portmap[i].s))
> +				goto nla_put_failure;
> +			if (nla_put_u32(msg, SWITCH_PORTMAP_VIRT,
> +						dev->portmap[i].virt))
> +				goto nla_put_failure;
> +		}
> +		nla_nest_end(msg, p);
> +	}
> +	nla_nest_end(msg, m);
> +	return genlmsg_end(msg, hdr);
> +nla_put_failure:
> +	genlmsg_cancel(msg, hdr);
> +	return -EMSGSIZE;
> +}
> +
> +static int swconfig_dump_switches(struct sk_buff *skb,
> +		struct netlink_callback *cb)
> +{
> +	struct switch_dev *dev;
> +	int start = cb->args[0];
> +	int idx = 0;
> +
> +	swconfig_lock();
> +	list_for_each_entry(dev, &swdevs, dev_list) {
> +		if (++idx <= start)
> +			continue;
> +		if (swconfig_send_switch(skb, NETLINK_CB(cb->skb).portid,
> +				cb->nlh->nlmsg_seq, NLM_F_MULTI,
> +				dev) < 0)
> +			break;
> +	}
> +	swconfig_unlock();
> +	cb->args[0] = idx;
> +
> +	return skb->len;
> +}
> +
> +static int
> +swconfig_done(struct netlink_callback *cb)
> +{
> +	return 0;
> +}
> +
> +static struct genl_ops swconfig_ops[] = {
> +	{
> +		.cmd = SWITCH_CMD_LIST_GLOBAL,
> +		.doit = swconfig_list_attrs,
> +		.policy = switch_policy,
> +	},
> +	{
> +		.cmd = SWITCH_CMD_LIST_VLAN,
> +		.doit = swconfig_list_attrs,
> +		.policy = switch_policy,
> +	},
> +	{
> +		.cmd = SWITCH_CMD_LIST_PORT,
> +		.doit = swconfig_list_attrs,
> +		.policy = switch_policy,
> +	},
> +	{
> +		.cmd = SWITCH_CMD_GET_GLOBAL,
> +		.doit = swconfig_get_attr,
> +		.policy = switch_policy,
> +	},
> +	{
> +		.cmd = SWITCH_CMD_GET_VLAN,
> +		.doit = swconfig_get_attr,
> +		.policy = switch_policy,
> +	},
> +	{
> +		.cmd = SWITCH_CMD_GET_PORT,
> +		.doit = swconfig_get_attr,
> +		.policy = switch_policy,
> +	},
> +	{
> +		.cmd = SWITCH_CMD_SET_GLOBAL,
> +		.doit = swconfig_set_attr,
> +		.policy = switch_policy,
> +	},
> +	{
> +		.cmd = SWITCH_CMD_SET_VLAN,
> +		.doit = swconfig_set_attr,
> +		.policy = switch_policy,
> +	},
> +	{
> +		.cmd = SWITCH_CMD_SET_PORT,
> +		.doit = swconfig_set_attr,
> +		.policy = switch_policy,
> +	},
> +	{
> +		.cmd = SWITCH_CMD_GET_SWITCH,
> +		.dumpit = swconfig_dump_switches,
> +		.policy = switch_policy,
> +		.done = swconfig_done,
> +	}
> +};
> +
> +int
> +register_switch(struct switch_dev *dev, struct net_device *netdev)
> +{
> +	struct switch_dev *sdev;
> +	const int max_switches = 8 * sizeof(unsigned long);
> +	unsigned long in_use = 0;
> +	int i;
> +
> +	INIT_LIST_HEAD(&dev->dev_list);
> +	if (netdev) {
> +		dev->netdev = netdev;
> +		if (!dev->alias)
> +			dev->alias = netdev->name;
> +	}
> +	BUG_ON(!dev->alias);
> +
> +	if (dev->ports > 0) {
> +		dev->portbuf = kzalloc(sizeof(struct switch_port) *
> +				dev->ports, GFP_KERNEL);
> +		if (!dev->portbuf)
> +			return -ENOMEM;
> +		dev->portmap = kzalloc(sizeof(struct switch_portmap) *
> +				dev->ports, GFP_KERNEL);
> +		if (!dev->portmap) {
> +			kfree(dev->portbuf);
> +			return -ENOMEM;
> +		}
> +	}
> +	swconfig_defaults_init(dev);
> +	mutex_init(&dev->sw_mutex);
> +	swconfig_lock();
> +	dev->id = ++swdev_id;
> +
> +	list_for_each_entry(sdev, &swdevs, dev_list) {
> +		if (!sscanf(sdev->devname, SWCONFIG_DEVNAME, &i))
> +			continue;
> +		if (i < 0 || i > max_switches)
> +			continue;
> +
> +		set_bit(i, &in_use);
> +	}
> +	i = find_first_zero_bit(&in_use, max_switches);
> +
> +	if (i == max_switches) {
> +		swconfig_unlock();
> +		return -ENFILE;
> +	}
> +
> +	/* fill device name */
> +	snprintf(dev->devname, IFNAMSIZ, SWCONFIG_DEVNAME, i);
> +
> +	list_add(&dev->dev_list, &swdevs);
> +	swconfig_unlock();
> +
> +	return 0;
> +}
> +EXPORT_SYMBOL_GPL(register_switch);
> +
> +void
> +unregister_switch(struct switch_dev *dev)
> +{
> +	kfree(dev->portbuf);
> +	mutex_lock(&dev->sw_mutex);
> +	swconfig_lock();
> +	list_del(&dev->dev_list);
> +	swconfig_unlock();
> +	mutex_unlock(&dev->sw_mutex);
> +}
> +EXPORT_SYMBOL_GPL(unregister_switch);
> +
> +
> +static int __init
> +swconfig_init(void)
> +{
> +	int i, err;
> +
> +	INIT_LIST_HEAD(&swdevs);
> +	err = genl_register_family(&switch_fam);
> +	if (err)
> +		return err;
> +
> +	for (i = 0; i < ARRAY_SIZE(swconfig_ops); i++) {
> +		err = genl_register_ops(&switch_fam, &swconfig_ops[i]);
> +		if (err)
> +			goto unregister;
> +	}
> +
> +	return 0;
> +
> +unregister:
> +	genl_unregister_family(&switch_fam);
> +	return err;
> +}
> +
> +static void __exit
> +swconfig_exit(void)
> +{
> +	genl_unregister_family(&switch_fam);
> +}
> +
> +module_init(swconfig_init);
> +module_exit(swconfig_exit);
> +
> diff --git a/include/linux/swconfig.h b/include/linux/swconfig.h
> new file mode 100644
> index 0000000..fd96eec
> --- /dev/null
> +++ b/include/linux/swconfig.h
> @@ -0,0 +1,180 @@
> +/*
> + * Switch configuration API
> + *
> + * Copyright (C) 2008 Felix Fietkau <nbd@openwrt.org>
> + *
> + * This program is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU General Public License
> + * as published by the Free Software Foundation; either version 2
> + * of the License, or (at your option) any later version.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + */
> +#ifndef _LINUX_SWITCH_H
> +#define _LINUX_SWITCH_H
> +
> +#include <net/genetlink.h>
> +#include <uapi/linux/swconfig.h>
> +
> +struct switch_dev;
> +struct switch_op;
> +struct switch_val;
> +struct switch_attr;
> +struct switch_attrlist;
> +struct switch_led_trigger;
> +
> +int register_switch(struct switch_dev *dev, struct net_device *netdev);
> +void unregister_switch(struct switch_dev *dev);
> +
> +/**
> + * struct switch_attrlist - attribute list
> + *
> + * @n_attr: number of attributes
> + * @attr: pointer to the attributes array
> + */
> +struct switch_attrlist {
> +	int n_attr;
> +	const struct switch_attr *attr;
> +};
> +
> +enum switch_port_speed {
> +	SWITCH_PORT_SPEED_UNKNOWN = 0,
> +	SWITCH_PORT_SPEED_10 = 10,
> +	SWITCH_PORT_SPEED_100 = 100,
> +	SWITCH_PORT_SPEED_1000 = 1000,
> +};
> +
> +struct switch_port_link {
> +	bool link;
> +	bool duplex;
> +	bool aneg;
> +	bool tx_flow;
> +	bool rx_flow;
> +	enum switch_port_speed speed;
> +};
> +
> +struct switch_port_stats {
> +	unsigned long tx_bytes;
> +	unsigned long rx_bytes;
> +};
> +
> +/**
> + * struct switch_dev_ops - switch driver operations
> + *
> + * @attr_global: global switch attribute list
> + * @attr_port: port attribute list
> + * @attr_vlan: vlan attribute list
> + *
> + * Callbacks:
> + *
> + * @get_vlan_ports: read the port list of a VLAN
> + * @set_vlan_ports: set the port list of a VLAN
> + *
> + * @get_port_pvid: get the primary VLAN ID of a port
> + * @set_port_pvid: set the primary VLAN ID of a port
> + *
> + * @apply_config: apply all changed settings to the switch
> + * @reset_switch: resetting the switch
> + *
> + * @get_port_link: read the port link status
> + * @get_port_stats: read the port statistics counters
> + */
> +struct switch_dev_ops {
> +	struct switch_attrlist attr_global, attr_port, attr_vlan;
> +
> +	int (*get_vlan_ports)(struct switch_dev *dev, struct switch_val *val);
> +	int (*set_vlan_ports)(struct switch_dev *dev, struct switch_val *val);
> +
> +	int (*get_port_pvid)(struct switch_dev *dev, int port, int *val);
> +	int (*set_port_pvid)(struct switch_dev *dev, int port, int val);
> +
> +	int (*apply_config)(struct switch_dev *dev);
> +	int (*reset_switch)(struct switch_dev *dev);
> +
> +	int (*get_port_link)(struct switch_dev *dev, int port,
> +			     struct switch_port_link *link);
> +	int (*get_port_stats)(struct switch_dev *dev, int port,
> +			      struct switch_port_stats *stats);
> +};
> +
> +/**
> + * struct switch_dev - switch device
> + *
> + * @ops: switch driver operations pointer
> + * @devname: switch device name (automatically filled)
> + * @name: switch driver name returned to user-space
> + * @alias: alias name for the switch (instead of ethX) returned to user-space
> + * @netdev: network device pointer if alias is not used
> + *
> + * @ports: number of physical switch ports
> + * @vlans: number of supported VLANs
> + * @cpu_port: identifier for the CPU port
> + */
> +struct switch_dev {
> +	const struct switch_dev_ops *ops;
> +	/* will be automatically filled */
> +	char devname[IFNAMSIZ];
> +
> +	const char *name;
> +	/* NB: either alias or netdev must be set */
> +	const char *alias;
> +	struct net_device *netdev;
> +
> +	int ports;
> +	int vlans;
> +	int cpu_port;
> +
> +	/* the following fields are internal for swconfig */
> +	int id;
> +	struct list_head dev_list;
> +	unsigned long def_global, def_port, def_vlan;
> +
> +	struct mutex sw_mutex;
> +	struct switch_port *portbuf;
> +	struct switch_portmap *portmap;
> +
> +	char buf[128];
> +};
> +
> +struct switch_port {
> +	u32 id;
> +	u32 flags;
> +};
> +
> +struct switch_portmap {
> +	u32 virt;
> +	const char *s;
> +};
> +
> +struct switch_val {
> +	const struct switch_attr *attr;
> +	int port_vlan;
> +	int len;
> +	union {
> +		const char *s;
> +		u32 i;
> +		struct switch_port *ports;
> +	} value;
> +};
> +
> +struct switch_attr {
> +	int disabled;
> +	int type;
> +	const char *name;
> +	const char *description;
> +
> +	int (*set)(struct switch_dev *dev, const struct switch_attr *attr,
> +			struct switch_val *val);
> +	int (*get)(struct switch_dev *dev, const struct switch_attr *attr,
> +			struct switch_val *val);
> +
> +	/* for driver internal use */
> +	int id;
> +	int ofs;
> +	int max;
> +};
> +
> +#endif /* _LINUX_SWITCH_H */
> diff --git a/include/uapi/linux/Kbuild b/include/uapi/linux/Kbuild
> index 115add2..0a995be 100644
> --- a/include/uapi/linux/Kbuild
> +++ b/include/uapi/linux/Kbuild
> @@ -363,6 +363,7 @@ header-y += stddef.h
>  header-y += string.h
>  header-y += suspend_ioctls.h
>  header-y += swab.h
> +header-y += swconfig.h
>  header-y += synclink.h
>  header-y += sysctl.h
>  header-y += sysinfo.h
> diff --git a/include/uapi/linux/swconfig.h b/include/uapi/linux/swconfig.h
> new file mode 100644
> index 0000000..17cf178
> --- /dev/null
> +++ b/include/uapi/linux/swconfig.h
> @@ -0,0 +1,103 @@
> +/*
> + * Switch configuration API
> + *
> + * Copyright (C) 2008 Felix Fietkau <nbd@openwrt.org>
> + *
> + * This program is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU General Public License
> + * as published by the Free Software Foundation; either version 2
> + * of the License, or (at your option) any later version.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + */
> +
> +#ifndef _UAPI_LINUX_SWITCH_H
> +#define _UAPI_LINUX_SWITCH_H
> +
> +#include <linux/types.h>
> +#include <linux/netdevice.h>
> +#include <linux/netlink.h>
> +#include <linux/genetlink.h>
> +#ifndef __KERNEL__
> +#include <netlink/netlink.h>
> +#include <netlink/genl/genl.h>
> +#include <netlink/genl/ctrl.h>
> +#endif
> +
> +/* main attributes */
> +enum {
> +	SWITCH_ATTR_UNSPEC,
> +	/* global */
> +	SWITCH_ATTR_TYPE,
> +	/* device */
> +	SWITCH_ATTR_ID,
> +	SWITCH_ATTR_DEV_NAME,
> +	SWITCH_ATTR_ALIAS,
> +	SWITCH_ATTR_NAME,
> +	SWITCH_ATTR_VLANS,
> +	SWITCH_ATTR_PORTS,
> +	SWITCH_ATTR_PORTMAP,
> +	SWITCH_ATTR_CPU_PORT,
> +	/* attributes */
> +	SWITCH_ATTR_OP_ID,
> +	SWITCH_ATTR_OP_TYPE,
> +	SWITCH_ATTR_OP_NAME,
> +	SWITCH_ATTR_OP_PORT,
> +	SWITCH_ATTR_OP_VLAN,
> +	SWITCH_ATTR_OP_VALUE_INT,
> +	SWITCH_ATTR_OP_VALUE_STR,
> +	SWITCH_ATTR_OP_VALUE_PORTS,
> +	SWITCH_ATTR_OP_DESCRIPTION,
> +	/* port lists */
> +	SWITCH_ATTR_PORT,
> +	SWITCH_ATTR_MAX
> +};
> +
> +enum {
> +	/* port map */
> +	SWITCH_PORTMAP_PORTS,
> +	SWITCH_PORTMAP_SEGMENT,
> +	SWITCH_PORTMAP_VIRT,
> +	SWITCH_PORTMAP_MAX
> +};
> +
> +/* commands */
> +enum {
> +	SWITCH_CMD_UNSPEC,
> +	SWITCH_CMD_GET_SWITCH,
> +	SWITCH_CMD_NEW_ATTR,
> +	SWITCH_CMD_LIST_GLOBAL,
> +	SWITCH_CMD_GET_GLOBAL,
> +	SWITCH_CMD_SET_GLOBAL,
> +	SWITCH_CMD_LIST_PORT,
> +	SWITCH_CMD_GET_PORT,
> +	SWITCH_CMD_SET_PORT,
> +	SWITCH_CMD_LIST_VLAN,
> +	SWITCH_CMD_GET_VLAN,
> +	SWITCH_CMD_SET_VLAN
> +};
> +
> +/* data types */
> +enum switch_val_type {
> +	SWITCH_TYPE_UNSPEC,
> +	SWITCH_TYPE_INT,
> +	SWITCH_TYPE_STRING,
> +	SWITCH_TYPE_PORTS,
> +	SWITCH_TYPE_NOVAL,
> +};
> +
> +/* port nested attributes */
> +enum {
> +	SWITCH_PORT_UNSPEC,
> +	SWITCH_PORT_ID,
> +	SWITCH_PORT_FLAG_TAGGED,
> +	SWITCH_PORT_ATTR_MAX
> +};
> +
> +#define SWITCH_ATTR_DEFAULTS_OFFSET	0x1000
> +
> +
> +#endif /* _UAPI_LINUX_SWITCH_H */

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 1/4 net-next] net: phy: add Generic Netlink Ethernet switch configuration API
  2013-10-22 19:22   ` Dan Williams
@ 2013-10-22 19:32     ` Florian Fainelli
  2013-10-22 19:47       ` David Miller
  2013-10-22 19:46     ` David Miller
  1 sibling, 1 reply; 41+ messages in thread
From: Florian Fainelli @ 2013-10-22 19:32 UTC (permalink / raw)
  To: Dan Williams
  Cc: netdev, David Miller, Sascha Hauer, Felix Fietkau, John Crispin,
	Jonas Gorski, Gary Thomas

2013/10/22 Dan Williams <dcbw@redhat.com>:
> On Tue, 2013-10-22 at 11:23 -0700, Florian Fainelli wrote:
>> This patch adds an Ethernet Switch generic netlink configuration API
>> which allows for doing the required configuration of managed Ethernet
>> switches commonly found in Wireless/Cable/DSL routers in the market.
>
> "swconfig" probably means "switch config", but is there any way to
> rename this away from the "sw" prefix, since "sw" typically means
> "software" and not "switch"?

Sure, how about something like "enetsw"? I would like to avoid using
"switch" too much since this is a C reserved keyword.
-- 
Florian

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 1/4 net-next] net: phy: add Generic Netlink Ethernet switch configuration API
  2013-10-22 19:22   ` Dan Williams
  2013-10-22 19:32     ` Florian Fainelli
@ 2013-10-22 19:46     ` David Miller
  1 sibling, 0 replies; 41+ messages in thread
From: David Miller @ 2013-10-22 19:46 UTC (permalink / raw)
  To: dcbw; +Cc: f.fainelli, netdev, s.hauer, nbd, blogic, jogo, gary

From: Dan Williams <dcbw@redhat.com>
Date: Tue, 22 Oct 2013 14:22:26 -0500

> On Tue, 2013-10-22 at 11:23 -0700, Florian Fainelli wrote:
>> This patch adds an Ethernet Switch generic netlink configuration API
>> which allows for doing the required configuration of managed Ethernet
>> switches commonly found in Wireless/Cable/DSL routers in the market.
> 
> "swconfig" probably means "switch config", but is there any way to
> rename this away from the "sw" prefix, since "sw" typically means
> "software" and not "switch"?

Agreed.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 1/4 net-next] net: phy: add Generic Netlink Ethernet switch configuration API
  2013-10-22 19:32     ` Florian Fainelli
@ 2013-10-22 19:47       ` David Miller
       [not found]         ` <1382477150.19269.69.camel@dcbw.foobar.com>
  0 siblings, 1 reply; 41+ messages in thread
From: David Miller @ 2013-10-22 19:47 UTC (permalink / raw)
  To: f.fainelli; +Cc: dcbw, netdev, s.hauer, nbd, blogic, jogo, gary

From: Florian Fainelli <f.fainelli@gmail.com>
Date: Tue, 22 Oct 2013 12:32:29 -0700

> 2013/10/22 Dan Williams <dcbw@redhat.com>:
>> On Tue, 2013-10-22 at 11:23 -0700, Florian Fainelli wrote:
>>> This patch adds an Ethernet Switch generic netlink configuration API
>>> which allows for doing the required configuration of managed Ethernet
>>> switches commonly found in Wireless/Cable/DSL routers in the market.
>>
>> "swconfig" probably means "switch config", but is there any way to
>> rename this away from the "sw" prefix, since "sw" typically means
>> "software" and not "switch"?
> 
> Sure, how about something like "enetsw"? I would like to avoid using
> "switch" too much since this is a C reserved keyword.

"swtch"? :-)

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 1/4 net-next] net: phy: add Generic Netlink Ethernet switch configuration API
  2013-10-22 18:23 ` [PATCH 1/4 net-next] net: phy: add Generic Netlink Ethernet " Florian Fainelli
  2013-10-22 19:22   ` Dan Williams
@ 2013-10-22 19:53   ` John Fastabend
  2013-10-22 19:59     ` Florian Fainelli
  1 sibling, 1 reply; 41+ messages in thread
From: John Fastabend @ 2013-10-22 19:53 UTC (permalink / raw)
  To: Florian Fainelli
  Cc: netdev, davem, s.hauer, nbd, blogic, jogo, gary,
	Jamal Hadi Salim, Neil Horman

On 10/22/2013 11:23 AM, Florian Fainelli wrote:
> This patch adds an Ethernet Switch generic netlink configuration API
> which allows for doing the required configuration of managed Ethernet
> switches commonly found in Wireless/Cable/DSL routers in the market.
>
> Since this API is based on the Generic Netlink infrastructure it is very
> easy to extend a particular switch driver to support additional features
> and to adapt it to specific switches.
>

> So far the API includes support for:
>
> - getting/setting a port VLAN id
> - getting/setting VLAN port membership
> - getting a port link status
> - getting a port statistics counters
> - resetting a switch device
> - applying a configuration to a switch device
>

Did you consider exposing each physical switch port as a netdevice on
the host? I would assume your switch driver could do this.

Then you can drop the port specific attributes (link status, stats, etc)
and use existing interfaces. The win being my tools work equally well on
your real switch as they do on my software switch. Also by exposing net
devices you provide a mechanism to send packets over the port and trap
control packets.

Next instead of creating a switch specific netlink API could you use
the existing FDB API? Again what I would like is for my existing
applications to run on the switch without having to rewrite them. For
example it would be great to have 'bridge fdb show dev myswitch' report
the correct tables for both the Sw bridge, a real switch bridge, and
for the embedded SR-IOV bridge case.

I added Jamal and Neil because I think I remember talking about similar
ideas with them before.

Thanks,
.John

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 1/4 net-next] net: phy: add Generic Netlink Ethernet switch configuration API
  2013-10-22 19:53   ` John Fastabend
@ 2013-10-22 19:59     ` Florian Fainelli
  2013-10-22 20:25       ` Neil Horman
  0 siblings, 1 reply; 41+ messages in thread
From: Florian Fainelli @ 2013-10-22 19:59 UTC (permalink / raw)
  To: John Fastabend
  Cc: netdev, David Miller, Sascha Hauer, Felix Fietkau, John Crispin,
	Jonas Gorski, Gary Thomas, Jamal Hadi Salim, Neil Horman

2013/10/22 John Fastabend <john.r.fastabend@intel.com>:
> On 10/22/2013 11:23 AM, Florian Fainelli wrote:
>>
>> This patch adds an Ethernet Switch generic netlink configuration API
>> which allows for doing the required configuration of managed Ethernet
>> switches commonly found in Wireless/Cable/DSL routers in the market.
>>
>> Since this API is based on the Generic Netlink infrastructure it is very
>> easy to extend a particular switch driver to support additional features
>> and to adapt it to specific switches.
>>
>
>> So far the API includes support for:
>>
>> - getting/setting a port VLAN id
>> - getting/setting VLAN port membership
>> - getting a port link status
>> - getting a port statistics counters
>> - resetting a switch device
>> - applying a configuration to a switch device
>>
>
> Did you consider exposing each physical switch port as a netdevice on
> the host? I would assume your switch driver could do this.
>
> Then you can drop the port specific attributes (link status, stats, etc)
> and use existing interfaces. The win being my tools work equally well on
> your real switch as they do on my software switch. Also by exposing net
> devices you provide a mechanism to send packets over the port and trap
> control packets.

Well this is exactly what DSA does and which I do not like because it
is completely overkill for most switches out there which are using
802.1q tags and do not prepend/append proprietary tags for internal
traffic classification.

>
> Next instead of creating a switch specific netlink API could you use
> the existing FDB API? Again what I would like is for my existing
> applications to run on the switch without having to rewrite them. For
> example it would be great to have 'bridge fdb show dev myswitch' report
> the correct tables for both the Sw bridge, a real switch bridge, and
> for the embedded SR-IOV bridge case.

Ok, I know nothing about the FDB API, but will take a look and see if
that sounds suitable for the embedded use cases.

>
> I added Jamal and Neil because I think I remember talking about similar
> ideas with them before.

Thanks!
-- 
Florian

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 1/4 net-next] net: phy: add Generic Netlink Ethernet switch configuration API
  2013-10-22 19:59     ` Florian Fainelli
@ 2013-10-22 20:25       ` Neil Horman
  2013-10-22 22:09         ` Florian Fainelli
  0 siblings, 1 reply; 41+ messages in thread
From: Neil Horman @ 2013-10-22 20:25 UTC (permalink / raw)
  To: Florian Fainelli
  Cc: John Fastabend, netdev, David Miller, Sascha Hauer,
	Felix Fietkau, John Crispin, Jonas Gorski, Gary Thomas,
	Jamal Hadi Salim

On Tue, Oct 22, 2013 at 12:59:12PM -0700, Florian Fainelli wrote:
> 2013/10/22 John Fastabend <john.r.fastabend@intel.com>:
> > On 10/22/2013 11:23 AM, Florian Fainelli wrote:
> >>
> >> This patch adds an Ethernet Switch generic netlink configuration API
> >> which allows for doing the required configuration of managed Ethernet
> >> switches commonly found in Wireless/Cable/DSL routers in the market.
> >>
> >> Since this API is based on the Generic Netlink infrastructure it is very
> >> easy to extend a particular switch driver to support additional features
> >> and to adapt it to specific switches.
> >>
> >
> >> So far the API includes support for:
> >>
> >> - getting/setting a port VLAN id
> >> - getting/setting VLAN port membership
> >> - getting a port link status
> >> - getting a port statistics counters
> >> - resetting a switch device
> >> - applying a configuration to a switch device
> >>
> >
> > Did you consider exposing each physical switch port as a netdevice on
> > the host? I would assume your switch driver could do this.
> >
> > Then you can drop the port specific attributes (link status, stats, etc)
> > and use existing interfaces. The win being my tools work equally well on
> > your real switch as they do on my software switch. Also by exposing net
> > devices you provide a mechanism to send packets over the port and trap
> > control packets.
> 
> Well this is exactly what DSA does and which I do not like because it
> is completely overkill for most switches out there which are using
> 802.1q tags and do not prepend/append proprietary tags for internal
> traffic classification.
> 
> >
> > Next instead of creating a switch specific netlink API could you use
> > the existing FDB API? Again what I would like is for my existing
> > applications to run on the switch without having to rewrite them. For
> > example it would be great to have 'bridge fdb show dev myswitch' report
> > the correct tables for both the Sw bridge, a real switch bridge, and
> > for the embedded SR-IOV bridge case.
> 
> Ok, I know nothing about the FDB API, but will take a look and see if
> that sounds suitable for the embedded use cases.
> 
Further to Johns comments, why are you creating a new netlink protocol for this?
It seems that 90% of what you want to accomplish above is handled by rtnetlink.
As long as you write your driver properly, most of that should "just work".

Neil

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 1/4 net-next] net: phy: add Generic Netlink Ethernet switch configuration API
       [not found]         ` <1382477150.19269.69.camel@dcbw.foobar.com>
@ 2013-10-22 21:22           ` David Miller
  0 siblings, 0 replies; 41+ messages in thread
From: David Miller @ 2013-10-22 21:22 UTC (permalink / raw)
  To: dcbw; +Cc: f.fainelli, netdev, s.hauer, nbd, blogic, jogo, gary

From: Dan Williams <dcbw@redhat.com>
Date: Tue, 22 Oct 2013 16:25:50 -0500

> On Tue, 2013-10-22 at 15:47 -0400, David Miller wrote:
>> From: Florian Fainelli <f.fainelli@gmail.com>
>> Date: Tue, 22 Oct 2013 12:32:29 -0700
>> 
>> > 2013/10/22 Dan Williams <dcbw@redhat.com>:
>> >> On Tue, 2013-10-22 at 11:23 -0700, Florian Fainelli wrote:
>> >>> This patch adds an Ethernet Switch generic netlink configuration API
>> >>> which allows for doing the required configuration of managed Ethernet
>> >>> switches commonly found in Wireless/Cable/DSL routers in the market.
>> >>
>> >> "swconfig" probably means "switch config", but is there any way to
>> >> rename this away from the "sw" prefix, since "sw" typically means
>> >> "software" and not "switch"?
>> > 
>> > Sure, how about something like "enetsw"? I would like to avoid using
>> > "switch" too much since this is a C reserved keyword.
>> 
>> "swtch"? :-)
> 
> haha...  seriously though, "enetsw" or even "esw" or "ensw" would be
> better than plain "sw".  Your choice, I have no horse in the race other
> than the "not sw" horse :)

"enetsw" is fine by me :-)

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 1/4 net-next] net: phy: add Generic Netlink Ethernet switch configuration API
  2013-10-22 20:25       ` Neil Horman
@ 2013-10-22 22:09         ` Florian Fainelli
  2013-10-23 11:34           ` Neil Horman
  2013-10-23 11:47           ` Jamal Hadi Salim
  0 siblings, 2 replies; 41+ messages in thread
From: Florian Fainelli @ 2013-10-22 22:09 UTC (permalink / raw)
  To: Neil Horman
  Cc: John Fastabend, netdev, David Miller, Sascha Hauer,
	Felix Fietkau, John Crispin, Jonas Gorski, Gary Thomas,
	Jamal Hadi Salim

2013/10/22 Neil Horman <nhorman@tuxdriver.com>:
> On Tue, Oct 22, 2013 at 12:59:12PM -0700, Florian Fainelli wrote:
>> 2013/10/22 John Fastabend <john.r.fastabend@intel.com>:
>> > On 10/22/2013 11:23 AM, Florian Fainelli wrote:
>> >>
>> >> This patch adds an Ethernet Switch generic netlink configuration API
>> >> which allows for doing the required configuration of managed Ethernet
>> >> switches commonly found in Wireless/Cable/DSL routers in the market.
>> >>
>> >> Since this API is based on the Generic Netlink infrastructure it is very
>> >> easy to extend a particular switch driver to support additional features
>> >> and to adapt it to specific switches.
>> >>
>> >
>> >> So far the API includes support for:
>> >>
>> >> - getting/setting a port VLAN id
>> >> - getting/setting VLAN port membership
>> >> - getting a port link status
>> >> - getting a port statistics counters
>> >> - resetting a switch device
>> >> - applying a configuration to a switch device
>> >>
>> >
>> > Did you consider exposing each physical switch port as a netdevice on
>> > the host? I would assume your switch driver could do this.
>> >
>> > Then you can drop the port specific attributes (link status, stats, etc)
>> > and use existing interfaces. The win being my tools work equally well on
>> > your real switch as they do on my software switch. Also by exposing net
>> > devices you provide a mechanism to send packets over the port and trap
>> > control packets.
>>
>> Well this is exactly what DSA does and which I do not like because it
>> is completely overkill for most switches out there which are using
>> 802.1q tags and do not prepend/append proprietary tags for internal
>> traffic classification.
>>
>> >
>> > Next instead of creating a switch specific netlink API could you use
>> > the existing FDB API? Again what I would like is for my existing
>> > applications to run on the switch without having to rewrite them. For
>> > example it would be great to have 'bridge fdb show dev myswitch' report
>> > the correct tables for both the Sw bridge, a real switch bridge, and
>> > for the embedded SR-IOV bridge case.
>>
>> Ok, I know nothing about the FDB API, but will take a look and see if
>> that sounds suitable for the embedded use cases.
>>
> Further to Johns comments, why are you creating a new netlink protocol for this?
> It seems that 90% of what you want to accomplish above is handled by rtnetlink.
> As long as you write your driver properly, most of that should "just work".

This is not a new netlink protocol, but a generic netlink family. Why
would I extend rtnetlink to cover the remaining 10% which are not
going to be used on desktop and servers when a new generic netlink
family is cheap and can be selectively disabled in the kernel?
-- 
Florian

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 1/4 net-next] net: phy: add Generic Netlink Ethernet switch configuration API
  2013-10-22 22:09         ` Florian Fainelli
@ 2013-10-23 11:34           ` Neil Horman
  2013-10-23 11:47           ` Jamal Hadi Salim
  1 sibling, 0 replies; 41+ messages in thread
From: Neil Horman @ 2013-10-23 11:34 UTC (permalink / raw)
  To: Florian Fainelli
  Cc: John Fastabend, netdev, David Miller, Sascha Hauer,
	Felix Fietkau, John Crispin, Jonas Gorski, Gary Thomas,
	Jamal Hadi Salim

On Tue, Oct 22, 2013 at 03:09:32PM -0700, Florian Fainelli wrote:
> 2013/10/22 Neil Horman <nhorman@tuxdriver.com>:
> > On Tue, Oct 22, 2013 at 12:59:12PM -0700, Florian Fainelli wrote:
> >> 2013/10/22 John Fastabend <john.r.fastabend@intel.com>:
> >> > On 10/22/2013 11:23 AM, Florian Fainelli wrote:
> >> >>
> >> >> This patch adds an Ethernet Switch generic netlink configuration API
> >> >> which allows for doing the required configuration of managed Ethernet
> >> >> switches commonly found in Wireless/Cable/DSL routers in the market.
> >> >>
> >> >> Since this API is based on the Generic Netlink infrastructure it is very
> >> >> easy to extend a particular switch driver to support additional features
> >> >> and to adapt it to specific switches.
> >> >>
> >> >
> >> >> So far the API includes support for:
> >> >>
> >> >> - getting/setting a port VLAN id
> >> >> - getting/setting VLAN port membership
> >> >> - getting a port link status
> >> >> - getting a port statistics counters
> >> >> - resetting a switch device
> >> >> - applying a configuration to a switch device
> >> >>
> >> >
> >> > Did you consider exposing each physical switch port as a netdevice on
> >> > the host? I would assume your switch driver could do this.
> >> >
> >> > Then you can drop the port specific attributes (link status, stats, etc)
> >> > and use existing interfaces. The win being my tools work equally well on
> >> > your real switch as they do on my software switch. Also by exposing net
> >> > devices you provide a mechanism to send packets over the port and trap
> >> > control packets.
> >>
> >> Well this is exactly what DSA does and which I do not like because it
> >> is completely overkill for most switches out there which are using
> >> 802.1q tags and do not prepend/append proprietary tags for internal
> >> traffic classification.
> >>
> >> >
> >> > Next instead of creating a switch specific netlink API could you use
> >> > the existing FDB API? Again what I would like is for my existing
> >> > applications to run on the switch without having to rewrite them. For
> >> > example it would be great to have 'bridge fdb show dev myswitch' report
> >> > the correct tables for both the Sw bridge, a real switch bridge, and
> >> > for the embedded SR-IOV bridge case.
> >>
> >> Ok, I know nothing about the FDB API, but will take a look and see if
> >> that sounds suitable for the embedded use cases.
> >>
> > Further to Johns comments, why are you creating a new netlink protocol for this?
> > It seems that 90% of what you want to accomplish above is handled by rtnetlink.
> > As long as you write your driver properly, most of that should "just work".
> 
> This is not a new netlink protocol, but a generic netlink family. Why
Thats hair splitting.  The point I'm making here is that you're creating a new
communication path from user space to the kernel to do something that we already
have a communication path to do.

> would I extend rtnetlink to cover the remaining 10% which are not
> going to be used on desktop and servers when a new generic netlink
> family is cheap and can be selectively disabled in the kernel?
90% of it is already done on servers and desktops using rtnetlink (thats my
point), and you can reasonably add the other 10% (I think), if you just expose
the switch ports as their own ethernet interfaces.  You say DSA is overkill, but
if you just add the other switch ports as their own ethernet interfaces, you
would get most of the above work for free, which seems to me like less overkill
than a new netlink family and userspace tools. 

Regards
Neil

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 1/4 net-next] net: phy: add Generic Netlink Ethernet switch configuration API
  2013-10-22 22:09         ` Florian Fainelli
  2013-10-23 11:34           ` Neil Horman
@ 2013-10-23 11:47           ` Jamal Hadi Salim
  2013-10-23 12:04             ` Felix Fietkau
  1 sibling, 1 reply; 41+ messages in thread
From: Jamal Hadi Salim @ 2013-10-23 11:47 UTC (permalink / raw)
  To: Florian Fainelli, Neil Horman
  Cc: John Fastabend, netdev, David Miller, Sascha Hauer,
	Felix Fietkau, John Crispin, Jonas Gorski, Gary Thomas,
	Vlad Yasevich, Stephen Hemminger

On 10/22/13 18:09, Florian Fainelli wrote:
> 2013/10/22 Neil Horman <nhorman@tuxdriver.com>:
>> On Tue, Oct 22, 2013 at 12:59:12PM -0700, Florian Fainelli wrote:
>>> 2013/10/22 John Fastabend <john.r.fastabend@intel.com>:
>>>> On 10/22/2013 11:23 AM, Florian Fainelli wrote:
>>>>>
>
>>>
>>> Ok, I know nothing about the FDB API, but will take a look and see if
>>> that sounds suitable for the embedded use cases.
>>>
>> Further to Johns comments, why are you creating a new netlink protocol for this?
>> It seems that 90% of what you want to accomplish above is handled by rtnetlink.
>> As long as you write your driver properly, most of that should "just work".
>
> This is not a new netlink protocol, but a generic netlink family. Why
> would I extend rtnetlink to cover the remaining 10% which are not
> going to be used on desktop and servers when a new generic netlink
> family is cheap and can be selectively disabled in the kernel?
>

Florian,

I think it would be fantastic if you adopt the FDB API. The comment
to use rtnetlink configure is valid. You can configure hardware
switches as John has shown. I realize you guys have invested
tons of time and this stuff has been tested by tons of people and this
is a painful exercise to go through, but:
having more than one approach for configuring/controlling kernel
switch interfaces is not ideal. If you use the rtnetlink API then one
can configure both the Linux bridge, embedded intel switches, etc with
iproute2. i.e the switch becomes a bridge. I see a lot of commonolity
between your model based on what you described and the current bridge.
Pull the latest iproute2 code and look at "bridge" command.

Essentially, the current bridged could be described as an entity
that does L2 switching:
a) it has bridge ports which are any netdevs on Linux
b) it has an FDB which constitutes a MAC address as the lookup and 
optionally a VLAN. You can control learning and flooding.
c) it has vlan filtering capabilities which you can turn on/off. The
vlan capability to sellect PVIDs is also built in.
d) It has multicast snooping

I think your model needs #a and #b, you can ignore the rest.
I am not quiet sure how vlan port membership will apply; an fdb for
each entry will have a vlan. You could also create one bridge per vlan
(not the best  approach) - ccing Vlad and Stephen.


cheers,
jamal

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 1/4 net-next] net: phy: add Generic Netlink Ethernet switch configuration API
  2013-10-23 11:47           ` Jamal Hadi Salim
@ 2013-10-23 12:04             ` Felix Fietkau
  2013-10-23 12:53               ` Jamal Hadi Salim
  0 siblings, 1 reply; 41+ messages in thread
From: Felix Fietkau @ 2013-10-23 12:04 UTC (permalink / raw)
  To: Jamal Hadi Salim, Florian Fainelli, Neil Horman
  Cc: John Fastabend, netdev, David Miller, Sascha Hauer, John Crispin,
	Jonas Gorski, Gary Thomas, Vlad Yasevich, Stephen Hemminger

On 2013-10-23 1:47 PM, Jamal Hadi Salim wrote:
> Florian,
> 
> I think it would be fantastic if you adopt the FDB API. The comment
> to use rtnetlink configure is valid. You can configure hardware
> switches as John has shown. I realize you guys have invested
> tons of time and this stuff has been tested by tons of people and this
> is a painful exercise to go through, but:
> having more than one approach for configuring/controlling kernel
> switch interfaces is not ideal. If you use the rtnetlink API then one
> can configure both the Linux bridge, embedded intel switches, etc with
> iproute2. i.e the switch becomes a bridge. I see a lot of commonolity
> between your model based on what you described and the current bridge.
> Pull the latest iproute2 code and look at "bridge" command.
> 
> Essentially, the current bridged could be described as an entity
> that does L2 switching:
> a) it has bridge ports which are any netdevs on Linux
> b) it has an FDB which constitutes a MAC address as the lookup and 
> optionally a VLAN. You can control learning and flooding.
> c) it has vlan filtering capabilities which you can turn on/off. The
> vlan capability to sellect PVIDs is also built in.
> d) It has multicast snooping
> 
> I think your model needs #a and #b, you can ignore the rest.
> I am not quiet sure how vlan port membership will apply; an fdb for
> each entry will have a vlan. You could also create one bridge per vlan
> (not the best  approach) - ccing Vlad and Stephen.
I still don't understand how this is supposed to work with the kind of
switches that we're supporting with swconfig.

A typical switch has something like 5-8 ports (+ one port that goes to
the CPU), and handles the entire forwarding path on its own. It usually
allows creating VLANs and assigning ports to them (tagged, untagged),
but many (probably most) switches do not support controlling the
forwarding path via a MAC address based FDB.

Many also do not have support for a packet header to indicate the
incoming/outgoing switch port, so creating one netdev per port will work
only for link status, not for the data path.

- Felix

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 1/4 net-next] net: phy: add Generic Netlink Ethernet switch configuration API
  2013-10-23 12:04             ` Felix Fietkau
@ 2013-10-23 12:53               ` Jamal Hadi Salim
  2013-10-23 13:31                 ` Felix Fietkau
  2013-10-29 23:12                 ` Maxime Bizon
  0 siblings, 2 replies; 41+ messages in thread
From: Jamal Hadi Salim @ 2013-10-23 12:53 UTC (permalink / raw)
  To: Felix Fietkau, Florian Fainelli, Neil Horman
  Cc: John Fastabend, netdev, David Miller, Sascha Hauer, John Crispin,
	Jonas Gorski, Gary Thomas, Vlad Yasevich, Stephen Hemminger

On 10/23/13 08:04, Felix Fietkau wrote:


> A typical switch has something like 5-8 ports (+ one port that goes to
> the CPU),

My opinion:
So exposing the 5-8 ports as netdevs would be useful. Giving access to
their stats through per-port netdevs etc. i.e a switch/bridge will show
up on bootup and the 5-8 ports as well. The 5-8 ports will show up
as bridge ports to the switch.
If something requires other "services" like l3 - I am assuming that
would show up in the cpu port, but its role is really to demux
and send it to ingress of the originating port on ASIC (i.e dont
think it should be exposed).

>and handles the entire forwarding path on its own.

This is default behavior. i.e learning and flooding.
Can you at least retrieve the fdb? example how to figure out which
port a specific MAC address resides?

>It usually
> allows creating VLANs and assigning ports to them (tagged, untagged),

I wasnt sure about the vlans<->port mapping as i stated in the earlier
email. So on this issue, I can see the challenge.
You could of course put vlan netdevs on top of switch ports and then
attach those to the bridge, but i cant see an approach if a switch port
can support more than one vlan without having multiple bridges. example:
bridgeA: link ports {swp0:vlan1, swp1:vlan2, swp0:vlan4}
bridgeB: link ports {swp0:vlan3, swp1:vlan4, swp1:vlan2}


 > but many (probably most) switches do not support controlling the
> forwarding path via a MAC address based FDB.
>

Ok, so operations like fdb_add/del will be disallowed. This is really
up to the driver to not expose such ops.

> Many also do not have support for a packet header to indicate the
> incoming/outgoing switch port, so creating one netdev per port will work
> only for link status, not for the data path.

You mean when such a packet arrives on the "cpu" port, you wont know the
originating port?

cheers,
jamal

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 1/4 net-next] net: phy: add Generic Netlink Ethernet switch configuration API
  2013-10-23 12:53               ` Jamal Hadi Salim
@ 2013-10-23 13:31                 ` Felix Fietkau
  2013-10-23 14:09                   ` Jamal Hadi Salim
  2013-10-29 23:12                 ` Maxime Bizon
  1 sibling, 1 reply; 41+ messages in thread
From: Felix Fietkau @ 2013-10-23 13:31 UTC (permalink / raw)
  To: Jamal Hadi Salim, Florian Fainelli, Neil Horman
  Cc: John Fastabend, netdev, David Miller, Sascha Hauer, John Crispin,
	Jonas Gorski, Gary Thomas, Vlad Yasevich, Stephen Hemminger

On 2013-10-23 2:53 PM, Jamal Hadi Salim wrote:
> On 10/23/13 08:04, Felix Fietkau wrote:
> 
> 
>> A typical switch has something like 5-8 ports (+ one port that goes to
>> the CPU),
> 
> My opinion:
> So exposing the 5-8 ports as netdevs would be useful. Giving access to
> their stats through per-port netdevs etc. i.e a switch/bridge will show
> up on bootup and the 5-8 ports as well. The 5-8 ports will show up
> as bridge ports to the switch.
So you would like to have 'dummy' netdevs that don't actually work like
real ones, just to get stats?

> If something requires other "services" like l3 - I am assuming that
> would show up in the cpu port, but its role is really to demux
> and send it to ingress of the originating port on ASIC (i.e dont
> think it should be exposed).
Many of these switches are designed to work completely standalone, i.e.
they receive their configuration once and then do their thing, often
they don't even have special treatment for the CPU port.

>>and handles the entire forwarding path on its own.
> 
> This is default behavior. i.e learning and flooding.
> Can you at least retrieve the fdb? example how to figure out which
> port a specific MAC address resides?
On some of them, but not all.

>>It usually
>> allows creating VLANs and assigning ports to them (tagged, untagged),
> 
> I wasnt sure about the vlans<->port mapping as i stated in the earlier
> email. So on this issue, I can see the challenge.
> You could of course put vlan netdevs on top of switch ports and then
> attach those to the bridge, but i cant see an approach if a switch port
> can support more than one vlan without having multiple bridges. example:
> bridgeA: link ports {swp0:vlan1, swp1:vlan2, swp0:vlan4}
> bridgeB: link ports {swp0:vlan3, swp1:vlan4, swp1:vlan2}
So even more dummy interfaces that serve no real purpose other than
configuration?

>  > but many (probably most) switches do not support controlling the
>> forwarding path via a MAC address based FDB.
> 
> Ok, so operations like fdb_add/del will be disallowed. This is really
> up to the driver to not expose such ops.
> 
>> Many also do not have support for a packet header to indicate the
>> incoming/outgoing switch port, so creating one netdev per port will work
>> only for link status, not for the data path.
> 
> You mean when such a packet arrives on the "cpu" port, you wont know the
> originating port?
Correct. I still get the impression that the model you're describing is
mostly incompatible with what we're trying to do, and comes at the cost
of quite a bit of extra complexity and bloat, not just on the
implementation side, but on the configuration side as well.
It also seems to make it more difficult to support vendor specific
features. I strongly doubt that the slight increase in consistency
between different kinds of switches/bridges is worth all of these extra
costs.

- Felix

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 1/4 net-next] net: phy: add Generic Netlink Ethernet switch configuration API
  2013-10-23 13:31                 ` Felix Fietkau
@ 2013-10-23 14:09                   ` Jamal Hadi Salim
  2013-10-23 14:32                     ` Felix Fietkau
  0 siblings, 1 reply; 41+ messages in thread
From: Jamal Hadi Salim @ 2013-10-23 14:09 UTC (permalink / raw)
  To: Felix Fietkau, Florian Fainelli, Neil Horman
  Cc: John Fastabend, netdev, David Miller, Sascha Hauer, John Crispin,
	Jonas Gorski, Gary Thomas, Vlad Yasevich, Stephen Hemminger

On 10/23/13 09:31, Felix Fietkau wrote:
> On 2013-10-23 2:53 PM, Jamal Hadi Salim wrote:

> So you would like to have 'dummy' netdevs that don't actually work like
> real ones, just to get stats?

Not just stats, but other utilities, example:
*operational status read and admin status control,
*MAC address setting?
*MTU setting
* If something shows up on the cpu port and comes up, we can make it 
appear to be from such a netdev (for the case where this applies)
* etc

> Many of these switches are designed to work completely standalone, i.e.
> they receive their configuration once and then do their thing, often
> they don't even have special treatment for the CPU port.
>

So if i understood the worst case scenario:
- no packets will ever come to the CPU
- minimal config only such as configuring ports and what vlans they
accept
- you cant query the device for anything else not even stats

>> Can you at least retrieve the fdb? example how to figure out which
>> port a specific MAC address resides?
> On some of them, but not all.
>

I think this would be a fit for netdev->features to set capabilities at
initialization.
So canSetfdb, canGetfdb, canDelfdb etc


>> can support more than one vlan without having multiple bridges. example:
>> bridgeA: link ports {swp0:vlan1, swp1:vlan2, swp0:vlan4}
>> bridgeB: link ports {swp0:vlan3, swp1:vlan4, swp1:vlan2}
> So even more dummy interfaces that serve no real purpose other than
> configuration?

Yes. It may sound rediculous(trademark for that owned by DaveM), but
given the returns that all other classical linux tools work, I think it
is worth it.
Disclaimer: I still think this part is acrobatic in nature i.e no good
one-to-one mapping

> Correct.

How do you deal with those situations today example when a packet
shows up in the cpu port and they require routing?
Do you have one monolithic switch netdev ?

>I still get the impression that the model you're describing is
> mostly incompatible with what we're trying to do, and comes at the cost
> of quite a bit of extra complexity and bloat, not just on the
> implementation side, but on the configuration side as well.

/Sigh
I understand it is a dilema especially when you have your model proven
already with users.
The danger is one-offs where certain tools only work with certain
instantiations of common features. From a usability perspective,
it would be nice to use iproute2, ifconfig etc on the switch/ports and
not learn another tool (or program the switch to a different API).

> It also seems to make it more difficult to support vendor specific
> features. I strongly doubt that the slight increase in consistency
> between different kinds of switches/bridges is worth all of these extra
> costs.

I am not privy to what specific vendor features exist that are out of
whack. But note:
We have ability to set capabilities (netdev->features is one, but you 
can add another netdev->field). Would it not make sense for the driver
  to set such capabilities and the generic code to turn on/off certain
things? Example turn on netdev->ops->fdb_add if the switch is capable
etc.

cheers,
jamal

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 1/4 net-next] net: phy: add Generic Netlink Ethernet switch configuration API
  2013-10-23 14:09                   ` Jamal Hadi Salim
@ 2013-10-23 14:32                     ` Felix Fietkau
  2013-10-25 11:43                       ` Jamal Hadi Salim
  0 siblings, 1 reply; 41+ messages in thread
From: Felix Fietkau @ 2013-10-23 14:32 UTC (permalink / raw)
  To: Jamal Hadi Salim, Florian Fainelli, Neil Horman
  Cc: John Fastabend, netdev, David Miller, Sascha Hauer, John Crispin,
	Jonas Gorski, Gary Thomas, Vlad Yasevich, Stephen Hemminger

On 2013-10-23 4:09 PM, Jamal Hadi Salim wrote:
> On 10/23/13 09:31, Felix Fietkau wrote:
>> On 2013-10-23 2:53 PM, Jamal Hadi Salim wrote:
> 
>> So you would like to have 'dummy' netdevs that don't actually work like
>> real ones, just to get stats?
> 
> Not just stats, but other utilities, example:
> *operational status read and admin status control,

> *MAC address setting?
Typically ignored by switches.

> *MTU setting
Can usually not be controlled per-port. Where supported, it is usually a
global configuration parameter for the switch.

> * If something shows up on the cpu port and comes up, we can make it 
> appear to be from such a netdev (for the case where this applies)
I think that's actually more confusing for users if they find the same
kind of devices on multiple different switches, and on some they can be
used directly, on others they cannot.

> * etc
> 
>> Many of these switches are designed to work completely standalone, i.e.
>> they receive their configuration once and then do their thing, often
>> they don't even have special treatment for the CPU port.
>>
> 
> So if i understood the worst case scenario:
> - no packets will ever come to the CPU
> - minimal config only such as configuring ports and what vlans they
> accept
> - you cant query the device for anything else not even stats
Correct.

>>> Can you at least retrieve the fdb? example how to figure out which
>>> port a specific MAC address resides?
>> On some of them, but not all.
> I think this would be a fit for netdev->features to set capabilities at
> initialization.
> So canSetfdb, canGetfdb, canDelfdb etc

>>> can support more than one vlan without having multiple bridges. example:
>>> bridgeA: link ports {swp0:vlan1, swp1:vlan2, swp0:vlan4}
>>> bridgeB: link ports {swp0:vlan3, swp1:vlan4, swp1:vlan2}
>> So even more dummy interfaces that serve no real purpose other than
>> configuration?
> 
> Yes. It may sound rediculous(trademark for that owned by DaveM), but
> given the returns that all other classical linux tools work, I think it
> is worth it.
The classical Linux tools here only cover the most basic configuration
parts. In many cases, separate configuration options are needed. For
example, on some switches, forwarding table IDs can be assigned to VLANs.
Also, the switch driver is completely independent of the network device
driver that drives the port connected to the CPU port of the switch. The
only ways I can imagine implementing this in the Linux network stack
involve an unhealthy amount of layering violations or other forms of
ugly hackery.

> Disclaimer: I still think this part is acrobatic in nature i.e no good
> one-to-one mapping
> 
>> Correct.
> 
> How do you deal with those situations today example when a packet
> shows up in the cpu port and they require routing?
> Do you have one monolithic switch netdev ?
The switch driver usually attaches itself as a PHY driver, there is no
monolithic switch netdev.

>>I still get the impression that the model you're describing is
>> mostly incompatible with what we're trying to do, and comes at the cost
>> of quite a bit of extra complexity and bloat, not just on the
>> implementation side, but on the configuration side as well.
> 
> /Sigh
> I understand it is a dilema especially when you have your model proven
> already with users.
> The danger is one-offs where certain tools only work with certain
> instantiations of common features. From a usability perspective,
> it would be nice to use iproute2, ifconfig etc on the switch/ports and
> not learn another tool (or program the switch to a different API).
I fully agree that this would be nice to have. I've given quite a bit of
thought to trying to figure out if there's a simple clean way to
implement this, but in all of the proposals I've seen so far, the costs
(complexity, bloat, quirky interfaces) seem to massively outweigh the
benefits.

>> It also seems to make it more difficult to support vendor specific
>> features. I strongly doubt that the slight increase in consistency
>> between different kinds of switches/bridges is worth all of these extra
>> costs.
> 
> I am not privy to what specific vendor features exist that are out of
> whack. But note:
> We have ability to set capabilities (netdev->features is one, but you 
> can add another netdev->field). Would it not make sense for the driver
>   to set such capabilities and the generic code to turn on/off certain
> things? Example turn on netdev->ops->fdb_add if the switch is capable
> etc.
I don't think bloating up the netdev feature flags for lots of
single-vendor fields is a good idea. swconfig simply allows the driver
to register its own global, per-port and per-vlan attributes and user
space can discover them.

That also avoids the nasty issue of userspace code having to know about
all possible vendor specific features and bits of status information.

- Felix

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 1/4 net-next] net: phy: add Generic Netlink Ethernet switch configuration API
  2013-10-23 14:32                     ` Felix Fietkau
@ 2013-10-25 11:43                       ` Jamal Hadi Salim
  2013-10-25 13:01                         ` Felix Fietkau
  0 siblings, 1 reply; 41+ messages in thread
From: Jamal Hadi Salim @ 2013-10-25 11:43 UTC (permalink / raw)
  To: Felix Fietkau, Florian Fainelli, Neil Horman
  Cc: John Fastabend, netdev, David Miller, Sascha Hauer, John Crispin,
	Jonas Gorski, Gary Thomas, Vlad Yasevich, Stephen Hemminger

Hi Felix,

Sorry for the latency - some distractions on the side.

On 10/23/13 10:32, Felix Fietkau wrote:
> On 2013-10-23 4:09 PM, Jamal Hadi Salim wrote:

>> *MAC address setting?
> Typically ignored by switches.
>

Ok, I take it the minority allow you to do this.
For most, the switch port has some factory shipped MAC address?

>> *MTU setting
> Can usually not be controlled per-port. Where supported, it is usually a
> global configuration parameter for the switch.

Does that mean one mtu for all switch ports on such devices?


>> * If something shows up on the cpu port and comes up, we can make it
>> appear to be from such a netdev (for the case where this applies)
> I think that's actually more confusing for users if they find the same
> kind of devices on multiple different switches, and on some they can be
> used directly, on others they cannot.
>

But how does it work today for the case where you have one chip that
wont pass up the tag to the cpu and another that does? i.e what
happens to packets that end up being shunted to CPU?

> The classical Linux tools here only cover the most basic configuration
> parts. In many cases, separate configuration options are needed. For
> example, on some switches, forwarding table IDs can be assigned to VLANs.

Multiple forwarding tables?

> Also, the switch driver is completely independent of the network device
> driver that drives the port connected to the CPU port of the switch.

I guess this is because one piece manages attributes and other is
for packet processing?
There is good precedence in a few embedded systems which are
equally challenged but still expose ports as netdevs.

>The
> only ways I can imagine implementing this in the Linux network stack
> involve an unhealthy amount of layering violations or other forms of
> ugly hackery.
>


> The switch driver usually attaches itself as a PHY driver, there is no
> monolithic switch netdev.
>

Shouldnt the PHY driver be owned by some netdev?

> I fully agree that this would be nice to have. I've given quite a bit of
> thought to trying to figure out if there's a simple clean way to
> implement this, but in all of the proposals I've seen so far, the costs
> (complexity, bloat, quirky interfaces) seem to massively outweigh the
> benefits.
>

I can understand the massive differences in capabilities make this
harder to retrofit. But if the only cause for impendance mismatch
is these capability differences, I think it can be resolved.
We need a way to discover them and only use those available.

> I don't think bloating up the netdev feature flags for lots of
> single-vendor fields is a good idea.

I agree if you say there is a variety of capabilities.
But if this is to be resolved - there has to be a way for these
capabilities to be advertised by low level (and netdev->features
is our only vehicle at the moment). We could have switch features
in addition etc etc.

> swconfig simply allows the driver
> to register its own global, per-port and per-vlan attributes and user
> space can discover them.
>
> That also avoids the nasty issue of userspace code having to know about
> all possible vendor specific features and bits of status information.
>

So it seems to me you already have taken care of this piece.
Why not pull that into the netdev or bridge core and then re-use it?

cheers,
jamal

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 1/4 net-next] net: phy: add Generic Netlink Ethernet switch configuration API
  2013-10-25 11:43                       ` Jamal Hadi Salim
@ 2013-10-25 13:01                         ` Felix Fietkau
  2013-10-27 17:19                           ` Jamal Hadi Salim
  0 siblings, 1 reply; 41+ messages in thread
From: Felix Fietkau @ 2013-10-25 13:01 UTC (permalink / raw)
  To: Jamal Hadi Salim, Florian Fainelli, Neil Horman
  Cc: John Fastabend, netdev, David Miller, Sascha Hauer, John Crispin,
	Jonas Gorski, Gary Thomas, Vlad Yasevich, Stephen Hemminger

On 2013-10-25 1:43 PM, Jamal Hadi Salim wrote:
> Hi Felix,
> 
> Sorry for the latency - some distractions on the side.
> 
> On 10/23/13 10:32, Felix Fietkau wrote:
>> On 2013-10-23 4:09 PM, Jamal Hadi Salim wrote:
> 
>>> *MAC address setting?
>> Typically ignored by switches.
>>
> 
> Ok, I take it the minority allow you to do this.
> For most, the switch port has some factory shipped MAC address?
I think it's common for the switch to have a global MAC address, not a
per-port one.

>>> *MTU setting
>> Can usually not be controlled per-port. Where supported, it is usually a
>> global configuration parameter for the switch.
> 
> Does that mean one mtu for all switch ports on such devices?
Correct.

>>> * If something shows up on the cpu port and comes up, we can make it
>>> appear to be from such a netdev (for the case where this applies)
>> I think that's actually more confusing for users if they find the same
>> kind of devices on multiple different switches, and on some they can be
>> used directly, on others they cannot.
> 
> But how does it work today for the case where you have one chip that
> wont pass up the tag to the cpu and another that does? i.e what
> happens to packets that end up being shunted to CPU?
'won't pass up the tag'? The switch is treated in pretty much the same
way as a normal managed standalone switch (you know, one you can buy in
a shop and plug your Ethernet cable into).
You simply tell it, which VLANs to put on which ports, and make the
ports tagged or untagged.
The link between the switch and the CPU is not really special, for the
switch it's just another port. This way of configuring works with pretty
much all switches that we're using.

>> The classical Linux tools here only cover the most basic configuration
>> parts. In many cases, separate configuration options are needed. For
>> example, on some switches, forwarding table IDs can be assigned to VLANs.
> 
> Multiple forwarding tables?
Yes, some switches have them, and they can be useful when dealing with
multiple VLANs.

>> Also, the switch driver is completely independent of the network device
>> driver that drives the port connected to the CPU port of the switch.
> 
> I guess this is because one piece manages attributes and other is
> for packet processing?
> There is good precedence in a few embedded systems which are
> equally challenged but still expose ports as netdevs.
No, because the connection between the CPU and the switch is handled by
a normal Ethernet MAC. The Ethernet chip doesn't care if there's a
switch connected to it, or a regular PHY.
It's just a normal MII connection, nothing more.

>>The
>> only ways I can imagine implementing this in the Linux network stack
>> involve an unhealthy amount of layering violations or other forms of
>> ugly hackery.
>> The switch driver usually attaches itself as a PHY driver, there is no
>> monolithic switch netdev.
> 
> Shouldnt the PHY driver be owned by some netdev?
Right, the netdev that owns the PHY is a normal Ethernet MAC, running
any normal Linux Ethernet driver.

>> I fully agree that this would be nice to have. I've given quite a bit of
>> thought to trying to figure out if there's a simple clean way to
>> implement this, but in all of the proposals I've seen so far, the costs
>> (complexity, bloat, quirky interfaces) seem to massively outweigh the
>> benefits.
> I can understand the massive differences in capabilities make this
> harder to retrofit. But if the only cause for impendance mismatch
> is these capability differences, I think it can be resolved.
> We need a way to discover them and only use those available.
I remain absolutely unconvinced that this will make the end result
better. Right now, these switches act like separate devices, because
aside from the fact that they're put on the same board with other
components, they pretty much *are* separate devices.

You seem to insist on treating it as a kind of port multiplexer + bridge
accelerator instead of a mostly standalone switch.

This may work for some devices, but on others this simply a model that
the hardware wasn't designed for. Sure, we could try to cram in all
those special cases, extra options, and hack through the layers where
they're in the way. If *all* you care about is being able to reuse the
existing interfaces, that might even seem like a good idea.

On the other hand, I've pointed out quite a few examples where the model
of trying to cram it into the bridge API is just a bad fit in general.

>> I don't think bloating up the netdev feature flags for lots of
>> single-vendor fields is a good idea.
> 
> I agree if you say there is a variety of capabilities.
> But if this is to be resolved - there has to be a way for these
> capabilities to be advertised by low level (and netdev->features
> is our only vehicle at the moment). We could have switch features
> in addition etc etc.
Aside from the fact that the swconfig code is already there, the model
that it uses is inherently simple. I worry about all the extra
complexity that we will have to add to try to retrofit this into a
mostly incompatible configuration model.

>> swconfig simply allows the driver
>> to register its own global, per-port and per-vlan attributes and user
>> space can discover them.
>>
>> That also avoids the nasty issue of userspace code having to know about
>> all possible vendor specific features and bits of status information.
> So it seems to me you already have taken care of this piece.
> Why not pull that into the netdev or bridge core and then re-use it?
Because it just doesn't fit there very well.

- Felix

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 1/4 net-next] net: phy: add Generic Netlink Ethernet switch configuration API
  2013-10-25 13:01                         ` Felix Fietkau
@ 2013-10-27 17:19                           ` Jamal Hadi Salim
  2013-10-27 18:14                             ` Florian Fainelli
  2013-10-27 19:51                             ` Felix Fietkau
  0 siblings, 2 replies; 41+ messages in thread
From: Jamal Hadi Salim @ 2013-10-27 17:19 UTC (permalink / raw)
  To: Felix Fietkau, Florian Fainelli, Neil Horman
  Cc: John Fastabend, netdev, David Miller, Sascha Hauer, John Crispin,
	Jonas Gorski, Gary Thomas, Vlad Yasevich, Stephen Hemminger

On 10/25/13 09:01, Felix Fietkau wrote:
> On 2013-10-25 1:43 PM, Jamal Hadi Salim wrote:

> I think it's common for the switch to have a global MAC address, not a
> per-port one.

Ok, I see. Real cheep.

> 'won't pass up the tag'? The switch is treated in pretty much the same
> way as a normal managed standalone switch (you know, one you can buy in
> a shop and plug your Ethernet cable into).
> You simply tell it, which VLANs to put on which ports, and make the
> ports tagged or untagged.
> The link between the switch and the CPU is not really special, for the
> switch it's just another port. This way of configuring works with pretty
> much all switches that we're using.

So does it get its own MAC address? Other than flooding broadcasts,
how does one end up sending packets to the cpu?

> Yes, some switches have them, and they can be useful when dealing with
> multiple VLANs.

Very nice. So we go from one extreme of cheep to sophisticated ;->
I think the only way you can achieve multiple tables on the bridge
is by creating multiple bridges.

> No, because the connection between the CPU and the switch is handled by
> a normal Ethernet MAC. The Ethernet chip doesn't care if there's a
> switch connected to it, or a regular PHY.
> It's just a normal MII connection, nothing more.
>
[..]
 >
 > Right, the netdev that owns the PHY is a normal Ethernet MAC, running
 > any normal Linux Ethernet driver.
 >
[..]
> I remain absolutely unconvinced that this will make the end result
> better. Right now, these switches act like separate devices, because
> aside from the fact that they're put on the same board with other
> components, they pretty much *are* separate devices.
>
> You seem to insist on treating it as a kind of port multiplexer + bridge
> accelerator instead of a mostly standalone switch.
>


Yes, the above is the point i was making.
I apologize for sounding like a broken record, but to just re-iterate:
there are, if i recall correctly, several drivers  in the kernel
which are challenged as such (with single entry point into the CPU)
which expose multiple netdevs with the driver acting as mux point.

> This may work for some devices, but on others this simply a model that
> the hardware wasn't designed for.

I agree. But what i just described above is not new. A lot of embedded
multiport NICs tend to be handicapped in exactly the same way.

> Sure, we could try to cram in all
> those special cases, extra options, and hack through the layers where
> they're in the way. If *all* you care about is being able to reuse the
> existing interfaces, that might even seem like a good idea.
>

I do care a lot about using existing interfaces ;-> Great usability
for someone to run a tool that has been around for 20 years and it
works. If i can just reuse my scripts without having to invent
new ones etc etc.

> On the other hand, I've pointed out quite a few examples where the model
> of trying to cram it into the bridge API is just a bad fit in general.
>

Sorry Felix, nothing you described is insurmountable.
The challenge here is non-technical:
You already have code that has been proven and is deployed for what 
appears to be sometime now.
I totally empathize.

cheers,
jamal

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 1/4 net-next] net: phy: add Generic Netlink Ethernet switch configuration API
  2013-10-27 17:19                           ` Jamal Hadi Salim
@ 2013-10-27 18:14                             ` Florian Fainelli
  2013-10-28 22:29                               ` Jamal Hadi Salim
  2013-10-27 19:51                             ` Felix Fietkau
  1 sibling, 1 reply; 41+ messages in thread
From: Florian Fainelli @ 2013-10-27 18:14 UTC (permalink / raw)
  To: Jamal Hadi Salim
  Cc: Felix Fietkau, Neil Horman, John Fastabend, netdev, David Miller,
	Sascha Hauer, John Crispin, Jonas Gorski, Gary Thomas,
	Vlad Yasevich, Stephen Hemminger

2013/10/27 Jamal Hadi Salim <jhs@mojatatu.com>:
> On 10/25/13 09:01, Felix Fietkau wrote:
>>
>> On 2013-10-25 1:43 PM, Jamal Hadi Salim wrote:
>
>
>> I think it's common for the switch to have a global MAC address, not a
>> per-port one.
>
>
> Ok, I see. Real cheep.

They are yes, the only "fancy" features these switches allow is
basically to set a given's port vlan id, which is already a huge
improvement compared to the vendor provided firmware.

>
>
>> 'won't pass up the tag'? The switch is treated in pretty much the same
>> way as a normal managed standalone switch (you know, one you can buy in
>> a shop and plug your Ethernet cable into).
>> You simply tell it, which VLANs to put on which ports, and make the
>> ports tagged or untagged.
>> The link between the switch and the CPU is not really special, for the
>> switch it's just another port. This way of configuring works with pretty
>> much all switches that we're using.
>
>
> So does it get its own MAC address? Other than flooding broadcasts,
> how does one end up sending packets to the cpu?

The switch does have an address learning process which is usually not
controlled by software at all, so yes, flooding is usually the way to
get it to the CPU.

>
>
>> Yes, some switches have them, and they can be useful when dealing with
>> multiple VLANs.
>
>
> Very nice. So we go from one extreme of cheep to sophisticated ;->
> I think the only way you can achieve multiple tables on the bridge
> is by creating multiple bridges.
>
>
>> No, because the connection between the CPU and the switch is handled by
>> a normal Ethernet MAC. The Ethernet chip doesn't care if there's a
>> switch connected to it, or a regular PHY.
>> It's just a normal MII connection, nothing more.
>>
> [..]
>
>>
>> Right, the netdev that owns the PHY is a normal Ethernet MAC, running
>> any normal Linux Ethernet driver.
>>
> [..]
>
>> I remain absolutely unconvinced that this will make the end result
>> better. Right now, these switches act like separate devices, because
>> aside from the fact that they're put on the same board with other
>> components, they pretty much *are* separate devices.
>>
>> You seem to insist on treating it as a kind of port multiplexer + bridge
>> accelerator instead of a mostly standalone switch.
>>
>
>
> Yes, the above is the point i was making.
> I apologize for sounding like a broken record, but to just re-iterate:
> there are, if i recall correctly, several drivers  in the kernel
> which are challenged as such (with single entry point into the CPU)
> which expose multiple netdevs with the driver acting as mux point.

Which exact drivers are you refering to? If we are talking about DSA
then yes, this is correct, but it is completely Ethernet MAC driver
agnostic.

>
>
>> This may work for some devices, but on others this simply a model that
>> the hardware wasn't designed for.
>
>
> I agree. But what i just described above is not new. A lot of embedded
> multiport NICs tend to be handicapped in exactly the same way.

Why would we expose the hardware switch physical ports as netdevs if
we cannot even any control over their data-path? Unlike these
multiport NICs, the only traffic you see and you can control is the
one from your CPU port.

>
>
>> Sure, we could try to cram in all
>> those special cases, extra options, and hack through the layers where
>> they're in the way. If *all* you care about is being able to reuse the
>> existing interfaces, that might even seem like a good idea.
>>
>
> I do care a lot about using existing interfaces ;-> Great usability
> for someone to run a tool that has been around for 20 years and it
> works. If i can just reuse my scripts without having to invent
> new ones etc etc.

I do not really see how we could bend the existing interface (is it
rtnetlink we are talking about or something else btw?) to expose these
switches, maybe we could with iproute2, but still, the user-space
interface/tool is far from being the problem here.


>
>
>> On the other hand, I've pointed out quite a few examples where the model
>> of trying to cram it into the bridge API is just a bad fit in general.
>>
>
> Sorry Felix, nothing you described is insurmountable.
> The challenge here is non-technical:
> You already have code that has been proven and is deployed for what appears
> to be sometime now.
> I totally empathize.

I don't think at any point in this discussion there was a mention that
we do not want to change the user or kernel interface in OpenWrt
because we have been using this for the past 5 years, on the contrary,
if we are bringing this to a wide audience, this is to get some proper
review and eventually change it.
-- 
Florian

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 1/4 net-next] net: phy: add Generic Netlink Ethernet switch configuration API
  2013-10-27 17:19                           ` Jamal Hadi Salim
  2013-10-27 18:14                             ` Florian Fainelli
@ 2013-10-27 19:51                             ` Felix Fietkau
  2013-10-28 22:53                               ` Jamal Hadi Salim
  1 sibling, 1 reply; 41+ messages in thread
From: Felix Fietkau @ 2013-10-27 19:51 UTC (permalink / raw)
  To: Jamal Hadi Salim, Florian Fainelli, Neil Horman
  Cc: John Fastabend, netdev, David Miller, Sascha Hauer, John Crispin,
	Jonas Gorski, Gary Thomas, Vlad Yasevich, Stephen Hemminger

On 2013-10-27 6:19 PM, Jamal Hadi Salim wrote:
> On 10/25/13 09:01, Felix Fietkau wrote:
>> 'won't pass up the tag'? The switch is treated in pretty much the same
>> way as a normal managed standalone switch (you know, one you can buy in
>> a shop and plug your Ethernet cable into).
>> You simply tell it, which VLANs to put on which ports, and make the
>> ports tagged or untagged.
>> The link between the switch and the CPU is not really special, for the
>> switch it's just another port. This way of configuring works with pretty
>> much all switches that we're using.
> 
> So does it get its own MAC address? Other than flooding broadcasts,
> how does one end up sending packets to the cpu?
That question does not make any sense to me. Aside from low level
control frames like pause frames for flow control, the switch has no
need to send packets to the CPU port on its own.
Remember what I told you about the switch being a *separate* entity from
the NIC that connects it to the CPU.

>> I remain absolutely unconvinced that this will make the end result
>> better. Right now, these switches act like separate devices, because
>> aside from the fact that they're put on the same board with other
>> components, they pretty much *are* separate devices.
>>
>> You seem to insist on treating it as a kind of port multiplexer + bridge
>> accelerator instead of a mostly standalone switch.
> 
> Yes, the above is the point i was making.
> I apologize for sounding like a broken record, but to just re-iterate:
> there are, if i recall correctly, several drivers  in the kernel
> which are challenged as such (with single entry point into the CPU)
> which expose multiple netdevs with the driver acting as mux point.
DSA does this, and last time I looked, it pushes *all* bridge traffic
through the CPU, making it completely unusable for slower embedded CPUs.

If I remember correctly, adding support 'bridge acceleration' was left
as an exercise for the reader and never actually implemented.

Sure, this could be fixed somehow, but even then the model and
assumptions that DSA is built on simply don't work for some of the
dumber switches that we support.

>> This may work for some devices, but on others this simply a model that
>> the hardware wasn't designed for.
> 
> I agree. But what i just described above is not new. A lot of embedded
> multiport NICs tend to be handicapped in exactly the same way.
> 
>> Sure, we could try to cram in all
>> those special cases, extra options, and hack through the layers where
>> they're in the way. If *all* you care about is being able to reuse the
>> existing interfaces, that might even seem like a good idea.
> 
> I do care a lot about using existing interfaces ;-> Great usability
> for someone to run a tool that has been around for 20 years and it
> works. If i can just reuse my scripts without having to invent
> new ones etc etc.
I see that. But please stop treating this as the *only* factor that
matters! I'd like to see a more balanced cost/benefit analysis.

>> On the other hand, I've pointed out quite a few examples where the model
>> of trying to cram it into the bridge API is just a bad fit in general.
> 
> Sorry Felix, nothing you described is insurmountable.
I'm not saying it's insurmountable, I'm saying it's impractical!
It makes one aspect (code reuse) better in some cases, while making lots
of other aspects worse.

> The challenge here is non-technical:
> You already have code that has been proven and is deployed for what 
> appears to be sometime now.
> I totally empathize.
Please stop making it look like this is the primary issue. Sure, it's
more convenient for us to reuse the existing code, but it's far from
being the only important factor here!
As an embedded Linux developer, I care a lot about fighting complexity
and bloat, and those do tend to be much harder to deal with than a bit
of API consistency.

I get the sense that trying to communicate on an abstract level gets us
nowhere in this discussion, so let me make it a bit more specific with
some examples:

One of the currently very common switches in many embedded devices is
the RTL8366/RTL8367. It has some flexibility when it comes to
configuring VLANs, and it's one of the few ones where you can configure
a forwarding table for a VLAN (which spans multiple ports), which allows
software bridging between multiple VLANs.
However, what this switch does *not* support is adding a header/trailer
to packets to indicate the originating port.
This means that all per-port netdevs will be dummy ports which don't
include the data path.

So let's say you have a configuration where you're using VLAN ID 4 on
port 1, and you want to bridge it to VLAN ID 400 on port 2.

Sounds easy enough, you can easily create a bridge that spans port1.4
and port2.400. Except, this particular switch (like pretty much any
other switch supported by swconfig) isn't actually able to handle such a
configuration on its own.
It needs two VLAN configurations, with different forwarding table IDs,
and then the software bridge on the CPU port needs to forward between
the two different VLANs.
To be able to handle such a configuration, the code would have to detect
this kind of special case scenario, somehow hook itself via rx handler
into the NIC connected to the CPU port and emulate that VLAN ID
replacement behavior.

With swconfig, you create two VLANs: VLAN 4, containing CPU and port1;
VLAN 400, containing CPU and port2. You then create a software bridge
between eth0.4 and eth0.400 (assuming eth0 is the NIC connected to the
switch).

In a different scenario, the code would also have to detect
configurations that the switch isn't able to handle, e.g.: bridging
port1.4 to eth1 and port2.4 to eth2.
Such a configuration wouldn't work at all with such a switch, because
the CPU isn't able to tell apart traffic from port1 and port2, and
there's no way to tell the switch that port1.4 and port2.4 should not be
connected to each other, but both should go to the CPU.

Those are just two simple scenarios from the top of my head - I'm pretty
sure I could come up with a long list of further corner cases and
quirks, which are simply either difficult to deal with, or completely
unnatural in the model that you're describing.

Trying to make all of these cases work in the code will make the whole
thing a lot more difficult to deal with and maintain. It will also make
it much harder for the user to figure out, what configurations work, and
what configurations don't.

Especially the case with reusing VLANs on different ports (but not
connecting them to each other) is something that can easily work with
software devices, but cannot be emulated on most embedded device
switches. The software bridge configuration model raises a lot of
expectations that these switches simply cannot meet.

If you look at the swconfig model, you will see that the abstraction
clearly communicates the limitations of these typical switches.

The configuration model simply doesn't even let you express these kinds
of unsuppported configurations that seem normal in the tools used to set
up software bridges/vlans.
At the same time, it's fairly consistent across the range of different
chips that we have drivers for. That certainly leaves a much smaller
amount of traps and surprises for users, compared to trying to emulate
the software bridge model by hacking through the layers.

Hopefully this will clear a few things up for you.

- Felix

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 1/4 net-next] net: phy: add Generic Netlink Ethernet switch configuration API
  2013-10-27 18:14                             ` Florian Fainelli
@ 2013-10-28 22:29                               ` Jamal Hadi Salim
  0 siblings, 0 replies; 41+ messages in thread
From: Jamal Hadi Salim @ 2013-10-28 22:29 UTC (permalink / raw)
  To: Florian Fainelli
  Cc: Felix Fietkau, Neil Horman, John Fastabend, netdev, David Miller,
	Sascha Hauer, John Crispin, Jonas Gorski, Gary Thomas,
	Vlad Yasevich, Stephen Hemminger

On 10/27/13 14:14, Florian Fainelli wrote:
> 2013/10/27 Jamal Hadi Salim <jhs@mojatatu.com>:
>> On 10/25/13 09:01, Felix Fietkau wrote:
>>>

>
> They are yes, the only "fancy" features these switches allow is
> basically to set a given's port vlan id, which is already a huge
> improvement compared to the vendor provided firmware.
>

Nice to know that you have something better than the vendor provided
stuff.

>
> The switch does have an address learning process which is usually not
> controlled by software at all, so yes, flooding is usually the way to
> get it to the CPU.
>

Ok.

> Which exact drivers are you refering to? If we are talking about DSA
> then yes, this is correct, but it is completely Ethernet MAC driver
> agnostic.
>

Sorry - cant point you to an exact one; one that i tried to convert to
NAPI and found these issues was from Netlogic (embedded 64 bit mips),
that i think now is in the kernel proper (and someone had converted to
NAPI as well). Let me get back to you with some sample examples..

> Why would we expose the hardware switch physical ports as netdevs if
> we cannot even any control over their data-path? Unlike these
> multiport NICs, the only traffic you see and you can control is the
> one from your CPU port.
>

Not necessarily for datapath, rather for control path. If i can
pull the stats, ifconfig up/down the port, set flow control
etc - then that is a  good reason to expose them.

>
> I do not really see how we could bend the existing interface (is it
> rtnetlink we are talking about or something else btw?) to expose these
> switches, maybe we could with iproute2, but still, the user-space
> interface/tool is far from being the problem here.
>

Look at the FDB API.
The user space interface as well as reusing kernel interfaces is my main
arguement.

> I don't think at any point in this discussion there was a mention that
> we do not want to change the user or kernel interface in OpenWrt
> because we have been using this for the past 5 years, on the contrary,
> if we are bringing this to a wide audience, this is to get some proper
> review and eventually change it.
>

Ok, sorry - I misinterpreted you and Felix. Like i said, if you gave me
that reason I would understand.

cheers,
jamal

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 1/4 net-next] net: phy: add Generic Netlink Ethernet switch configuration API
  2013-10-27 19:51                             ` Felix Fietkau
@ 2013-10-28 22:53                               ` Jamal Hadi Salim
  2013-10-29  9:34                                 ` Felix Fietkau
  2013-10-30 17:27                                 ` Lennert Buytenhek
  0 siblings, 2 replies; 41+ messages in thread
From: Jamal Hadi Salim @ 2013-10-28 22:53 UTC (permalink / raw)
  To: Felix Fietkau, Florian Fainelli, Neil Horman
  Cc: John Fastabend, netdev, David Miller, Sascha Hauer, John Crispin,
	Jonas Gorski, Gary Thomas, Vlad Yasevich, Stephen Hemminger,
	Lennert Buytenhek

On 10/27/13 15:51, Felix Fietkau wrote:
> On 2013-10-27 6:19 PM, Jamal Hadi Salim wrote:

> That question does not make any sense to me. Aside from low level
> control frames like pause frames for flow control, the switch has no
> need to send packets to the CPU port on its own.
> Remember what I told you about the switch being a *separate* entity from
> the NIC that connects it to the CPU.
>

I am assuming there is a MAC address which is identified to be that of a
switch. Something responds to ARP for example for that MAC. I think
you are  saying that for a certain class of switch chips, there is no
concept of "cpu port" - therefore there cannot be unicast from the
chip to the cpu.

> DSA does this, and last time I looked, it pushes *all* bridge traffic
> through the CPU, making it completely unusable for slower embedded CPUs.
>

I wasnt thinking DSA (rather some MIPS based embedded boards)- but now
that you bring it up, lets Cc Lennert.

> If I remember correctly, adding support 'bridge acceleration' was left
> as an exercise for the reader and never actually implemented.
>

 From talking to you, I realize there are things that are dumb and
cant be "accelerated". The scenarios so far have been for accelaration
(or to be correct: offloading).
And my contention is - this is a matter of capability discovery as
advertised by the driver and as used by the user tools.

> Sure, this could be fixed somehow, but even then the model and
> assumptions that DSA is built on simply don't work for some of the
> dumber switches that we support.
>

Agreed.

[.. content removed for brevity, dont think we have disagreements ..]


> One of the currently very common switches in many embedded devices is
> the RTL8366/RTL8367. It has some flexibility when it comes to
> configuring VLANs, and it's one of the few ones where you can configure
> a forwarding table for a VLAN (which spans multiple ports), which allows
> software bridging between multiple VLANs.
> However, what this switch does *not* support is adding a header/trailer
> to packets to indicate the originating port.
> This means that all per-port netdevs will be dummy ports which don't
> include the data path.
>

My view is that netdevs are still valuable even if only they get used 
for control path. Like you said earlier - you can still pull stats, flow 
control messages still make it through etc. They provide you
the consistent api to configure the switch above, ex:
If i was to use the FDB api for this switch as long as i can
abstract it in software as a bridge, I could send it a switch config
via its ops which says:
"I am giving you this entry with vland 400 for port 2, but i want you to
send it to the hardware not to your local entry"

> So let's say you have a configuration where you're using VLAN ID 4 on
> port 1, and you want to bridge it to VLAN ID 400 on port 2.
>
> Sounds easy enough, you can easily create a bridge that spans port1.4
> and port2.400. Except, this particular switch (like pretty much any
> other switch supported by swconfig) isn't actually able to handle such a
> configuration on its own.

Makes sense.
Let me point that even the Linux bridge cant handle this on its own
either.
You would need two bridges instantiated. The "cpu port" (we should call
it the "L3 port" really) is implicit in the case of the bridge i.e it
is the Linux network stack.
You would need to set the vlan filters on the bridge to strip the vlan
on egress of the first bridge etc ..

> It needs two VLAN configurations, with different forwarding table IDs,
> and then the software bridge on the CPU port needs to forward between
> the two different VLANs.
> To be able to handle such a configuration, the code would have to detect
> this kind of special case scenario, somehow hook itself via rx handler
> into the NIC connected to the CPU port and emulate that VLAN ID
> replacement behavior.
>


IMO: You dont need to muck with rx handler if you used bridge
abstraction. It becomes a config issue.

> With swconfig, you create two VLANs: VLAN 4, containing CPU and port1;
> VLAN 400, containing CPU and port2. You then create a software bridge
> between eth0.4 and eth0.400 (assuming eth0 is the NIC connected to the
> switch).
>

Can we call that "L3" instead of software bridge?
It can be done if you create a Linux bridge in software per L2 table id
in your chip. Then you attach the bridge ports.
A linux bridge of the sort, assuming there's a subnet per bridge is
configured thus:
bridge-tab1: link ports {eth0:vlan4, eth1:vlan4}, subnet 1
bridge-tab2: link ports {eth0:vlan400, eth1:vlan400}, subnet 2


> In a different scenario, the code would also have to detect
> configurations that the switch isn't able to handle, e.g.: bridging
> port1.4 to eth1 and port2.4 to eth2.
> Such a configuration wouldn't work at all with such a switch, because
> the CPU isn't able to tell apart traffic from port1 and port2, and
> there's no way to tell the switch that port1.4 and port2.4 should not be
> connected to each other, but both should go to the CPU.
>

Understood.
I think that discovery is a must - so you can apply different behavior
to different switches.
But you seem to have solved this already. Linux as is does not.
You can either have the driver tell you what it can/cant do or you
can attempt to fire and miss and get a return code that will tell
you that it cant achieve what you are asking it to do. I prefer the
former.

 >
> Those are just two simple scenarios from the top of my head - I'm pretty
> sure I could come up with a long list of further corner cases and
> quirks, which are simply either difficult to deal with, or completely
> unnatural in the model that you're describing.
>

I think these are the kind of things that need to be enumerated to come
to some conclusion.

> Trying to make all of these cases work in the code will make the whole
> thing a lot more difficult to deal with and maintain. It will also make
> it much harder for the user to figure out, what configurations work, and
> what configurations don't.
>
>
> Especially the case with reusing VLANs on different ports (but not
> connecting them to each other) is something that can easily work with
> software devices, but cannot be emulated on most embedded device
> switches. The software bridge configuration model raises a lot of
> expectations that these switches simply cannot meet.
>

I wouldnt expect every thing a software bridge does would be met by
a random switch.S/w bridge would be the super-set. But this is
not a new concept, example: Netdev itself is an abstraction - we have
USB, ethernet, wireless, variety of virtual interfaces etc.
Sometimes we dont even have the concept of a "link" in some of these
devices; infiniband would have a huge MAC address but i can still
use ifconfig on it etc.

> If you look at the swconfig model, you will see that the abstraction
> clearly communicates the limitations of these typical switches.
>

I will have to go back and look - but like i said earlier seems to me
you have solved this problem. Of the switch hardware i am familiar with
(high end pricey stuff), the capabilities tend to fall into the
following components:
-flooding control (i.e what should happen on destination failure)
-learning control (i.e what should happen on the source lookup failure)
(Ive seen knobs for "drop", "send to portX" where "X" could be cpu etc)
-fdb capacity
-whether it can do vlans, filtering pvids etc
-multicast snooping capability

To add to the above a few more based on talking to you:
- cpu port (in what ive come across this is always present, but
as you point out this cannot be assumed)
- ingress port tag (you point out that some cases this may never be
present even when the cpu port is present)
- ive never seen table id, but i think this is another one; in which
case the number of table ids becomes something one needs to discover..

cheers,
jamal

> The configuration model simply doesn't even let you express these kinds
> of unsuppported configurations that seem normal in the tools used to set
> up software bridges/vlans.
> At the same time, it's fairly consistent across the range of different
> chips that we have drivers for. That certainly leaves a much smaller
> amount of traps and surprises for users, compared to trying to emulate
> the software bridge model by hacking through the layers.
>
> Hopefully this will clear a few things up for you.
>
> - Felix
>

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 1/4 net-next] net: phy: add Generic Netlink Ethernet switch configuration API
  2013-10-28 22:53                               ` Jamal Hadi Salim
@ 2013-10-29  9:34                                 ` Felix Fietkau
  2013-10-30 11:45                                   ` Jamal Hadi Salim
  2013-10-30 17:27                                 ` Lennert Buytenhek
  1 sibling, 1 reply; 41+ messages in thread
From: Felix Fietkau @ 2013-10-29  9:34 UTC (permalink / raw)
  To: Jamal Hadi Salim, Florian Fainelli, Neil Horman
  Cc: John Fastabend, netdev, David Miller, Sascha Hauer, John Crispin,
	Jonas Gorski, Gary Thomas, Vlad Yasevich, Stephen Hemminger

On 2013-10-28 23:53, Jamal Hadi Salim wrote:
> On 10/27/13 15:51, Felix Fietkau wrote:
>> On 2013-10-27 6:19 PM, Jamal Hadi Salim wrote:
> 
>> That question does not make any sense to me. Aside from low level
>> control frames like pause frames for flow control, the switch has no
>> need to send packets to the CPU port on its own.
>> Remember what I told you about the switch being a *separate* entity from
>> the NIC that connects it to the CPU.
>>
> 
> I am assuming there is a MAC address which is identified to be that of a
> switch. Something responds to ARP for example for that MAC. I think
> you are  saying that for a certain class of switch chips, there is no
> concept of "cpu port" - therefore there cannot be unicast from the
> chip to the cpu.
These are simple switches, why would they respond to ARP?
I suspect that you're attributing too much functionality to the switch
itself. Think of it as a device similar to the cheap unmanaged ones you
can buy in a shop and hook up to your machine via Ethernet.
Add to that some very limited VLAN grouping functionality, and you're
pretty close to the limits of what these switches can do.
They don't do ARP, IP or other things. They learn about MAC addresses
from incoming packets to build their forwarding path.
The CPU port in this case is whatever port on the switch that you plug
the cable of your machine into :)

>> One of the currently very common switches in many embedded devices is
>> the RTL8366/RTL8367. It has some flexibility when it comes to
>> configuring VLANs, and it's one of the few ones where you can configure
>> a forwarding table for a VLAN (which spans multiple ports), which allows
>> software bridging between multiple VLANs.
>> However, what this switch does *not* support is adding a header/trailer
>> to packets to indicate the originating port.
>> This means that all per-port netdevs will be dummy ports which don't
>> include the data path.
> 
> My view is that netdevs are still valuable even if only they get used 
> for control path. Like you said earlier - you can still pull stats, flow 
> control messages still make it through etc. They provide you
> the consistent api to configure the switch above, ex:
> If i was to use the FDB api for this switch as long as i can
> abstract it in software as a bridge, I could send it a switch config
> via its ops which says:
> "I am giving you this entry with vland 400 for port 2, but i want you to
> send it to the hardware not to your local entry"
The FDB related abstraction that you're describing will not work with
the hardware that I'm talking about. Let's leave that one out of this
discussion.
As for per-port netdevs: Yes, you could pull stats.
No, flow control messages would not make it through.
No idea how it would provide a *consistent* API.
Either way, if adding netdevs just for stats and link state, that could
be easily added on top of swconfig (or whatever name we pick for it)
later. I just don't think it's worth it at this point.

>> So let's say you have a configuration where you're using VLAN ID 4 on
>> port 1, and you want to bridge it to VLAN ID 400 on port 2.
>>
>> Sounds easy enough, you can easily create a bridge that spans port1.4
>> and port2.400. Except, this particular switch (like pretty much any
>> other switch supported by swconfig) isn't actually able to handle such a
>> configuration on its own.
> 
> Makes sense.
> Let me point that even the Linux bridge cant handle this on its own
> either.
> You would need two bridges instantiated. The "cpu port" (we should call
> it the "L3 port" really) is implicit in the case of the bridge i.e it
> is the Linux network stack.
> You would need to set the vlan filters on the bridge to strip the vlan
> on egress of the first bridge etc ..
> 
>> It needs two VLAN configurations, with different forwarding table IDs,
>> and then the software bridge on the CPU port needs to forward between
>> the two different VLANs.
>> To be able to handle such a configuration, the code would have to detect
>> this kind of special case scenario, somehow hook itself via rx handler
>> into the NIC connected to the CPU port and emulate that VLAN ID
>> replacement behavior.
> 
> IMO: You dont need to muck with rx handler if you used bridge
> abstraction. It becomes a config issue.
If we don't need to muck with an rx handler, how are packets intercepted
from the NIC that connects to the switch?
That NIC is run by a driver that knows nothing about switch stuff.

>> With swconfig, you create two VLANs: VLAN 4, containing CPU and port1;
>> VLAN 400, containing CPU and port2. You then create a software bridge
>> between eth0.4 and eth0.400 (assuming eth0 is the NIC connected to the
>> switch).
> 
> Can we call that "L3" instead of software bridge?
L3? Why?

> Understood.
> I think that discovery is a must - so you can apply different behavior
> to different switches.
> But you seem to have solved this already. Linux as is does not.
> You can either have the driver tell you what it can/cant do or you
> can attempt to fire and miss and get a return code that will tell
> you that it cant achieve what you are asking it to do. I prefer the
> former.
I think that's way more confusing to users than presenting a consistent
model that properly reflects what you can do with the hardware.

But I sense a pattern here. I've long had my beef with quite a few Linux
network related APIs for being inconsistent, having no decent error
reporting when you're trying to configure things (errno doesn't count,
it's just too ambiguous), and just making it hard to figure out the
capabilities. Of course, none of this can be easily fixed due to ABI
stability constraints.
I do NOT wish to follow that pattern!

>> Those are just two simple scenarios from the top of my head - I'm pretty
>> sure I could come up with a long list of further corner cases and
>> quirks, which are simply either difficult to deal with, or completely
>> unnatural in the model that you're describing.
> I think these are the kind of things that need to be enumerated to come
> to some conclusion.
I'm not going to try to enumerate all the case; I have other projects
that I need to work on. :)

>> Trying to make all of these cases work in the code will make the whole
>> thing a lot more difficult to deal with and maintain. It will also make
>> it much harder for the user to figure out, what configurations work, and
>> what configurations don't.
>>
>>
>> Especially the case with reusing VLANs on different ports (but not
>> connecting them to each other) is something that can easily work with
>> software devices, but cannot be emulated on most embedded device
>> switches. The software bridge configuration model raises a lot of
>> expectations that these switches simply cannot meet.
> I wouldnt expect every thing a software bridge does would be met by
> a random switch.S/w bridge would be the super-set. But this is
> not a new concept, example: Netdev itself is an abstraction - we have
> USB, ethernet, wireless, variety of virtual interfaces etc.
> Sometimes we dont even have the concept of a "link" in some of these
> devices; infiniband would have a huge MAC address but i can still
> use ifconfig on it etc.
Only a *tiny* part of the software bridge configuration model can be
emulated, the rest does not fit and has to be handled through extensions
or different APIs anyway. That's why I am convinced that it's a really
bad model to try to make these switches fit into it.

You gain a tiny advantage with writing scripts, but at the same time,
the code gets more complex, the configuration interface gets more
confusing, there are more nasty corner cases to take care of.
Why do you insist on making so many things worse just for one tiny
advantage? Where's the pragmatic cost/benefit tradeoff?

>> If you look at the swconfig model, you will see that the abstraction
>> clearly communicates the limitations of these typical switches.
>>
> 
> I will have to go back and look - but like i said earlier seems to me
> you have solved this problem. Of the switch hardware i am familiar with
> (high end pricey stuff), the capabilities tend to fall into the
> following components:
> -flooding control (i.e what should happen on destination failure)
> -learning control (i.e what should happen on the source lookup failure)
> (Ive seen knobs for "drop", "send to portX" where "X" could be cpu etc)
> -fdb capacity
> -whether it can do vlans, filtering pvids etc
> -multicast snooping capability
Right, with most of the switches that we support, almost none of these
things work in a way that can be integrated with the network stack.

> To add to the above a few more based on talking to you:
> - cpu port (in what ive come across this is always present, but
> as you point out this cannot be assumed)
I'm not even sure what you mean when you say 'cpu port cannot be
assumed'. On pretty much all devices that we work with, one of the ports
connects to a NIC in the CPU. It's just that the switch cannot be
assumed to have special treatment for that CPU port. As far as it is
concerned, it is just another port like the others.

> - ingress port tag (you point out that some cases this may never be
> present even when the cpu port is present)
> - ive never seen table id, but i think this is another one; in which
> case the number of table ids becomes something one needs to discover..
Yes, and this is something that doesn't even map directly to something
in the software bridge world.

- Felix

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 1/4 net-next] net: phy: add Generic Netlink Ethernet switch configuration API
  2013-10-23 12:53               ` Jamal Hadi Salim
  2013-10-23 13:31                 ` Felix Fietkau
@ 2013-10-29 23:12                 ` Maxime Bizon
  2013-10-30 11:50                   ` Jamal Hadi Salim
  1 sibling, 1 reply; 41+ messages in thread
From: Maxime Bizon @ 2013-10-29 23:12 UTC (permalink / raw)
  To: Jamal Hadi Salim
  Cc: Felix Fietkau, Florian Fainelli, Neil Horman, John Fastabend,
	netdev, David Miller, Sascha Hauer, John Crispin, Jonas Gorski,
	Gary Thomas, Vlad Yasevich, Stephen Hemminger


On Wed, 2013-10-23 at 08:53 -0400, Jamal Hadi Salim wrote:

> So exposing the 5-8 ports as netdevs would be useful. Giving access to
> their stats through per-port netdevs etc. i.e a switch/bridge will 

While the intent is to make it look familiar to users, IMO this breaks
the rule of the least surprise.

>From a user POV, when you see a netdevice, you expect to be able to
receive or send packets from/to it. The ability to read stats/link is
only a secondary feature.

Wireless subsystem moved away from using dummy/additional netdevices
because it caused confusion.

multiqueue devices forced us to separate struct netdevice and struct
netdev_queue, maybe it's time for more surgery :)

-- 
Maxime

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 1/4 net-next] net: phy: add Generic Netlink Ethernet switch configuration API
  2013-10-29  9:34                                 ` Felix Fietkau
@ 2013-10-30 11:45                                   ` Jamal Hadi Salim
  2013-10-30 12:53                                     ` Felix Fietkau
  0 siblings, 1 reply; 41+ messages in thread
From: Jamal Hadi Salim @ 2013-10-30 11:45 UTC (permalink / raw)
  To: Felix Fietkau, Florian Fainelli, Neil Horman
  Cc: John Fastabend, netdev, David Miller, Sascha Hauer, John Crispin,
	Jonas Gorski, Gary Thomas, Vlad Yasevich, Stephen Hemminger

On 10/29/13 05:34, Felix Fietkau wrote:
> On 2013-10-28 23:53, Jamal Hadi Salim wrote:

>
> These are simple switches, why would they respond to ARP?
> I suspect that you're attributing too much functionality to the switch
> itself. Think of it as a device similar to the cheap unmanaged ones you
> can buy in a shop and hook up to your machine via Ethernet.
> Add to that some very limited VLAN grouping functionality, and you're
> pretty close to the limits of what these switches can do.
> They don't do ARP, IP or other things. They learn about MAC addresses
> from incoming packets to build their forwarding path.
> The CPU port in this case is whatever port on the switch that you plug
> the cable of your machine into :)

Ok, got it - the only use for cpu for these things is to retrieve things
like stats, link state, etc; can you even read the fdb?


> The FDB related abstraction that you're describing will not work with
> the hardware that I'm talking about. Let's leave that one out of this
> discussion.

sigh - ok. But you gotta help me understand why.

> As for per-port netdevs: Yes, you could pull stats.
> No, flow control messages would not make it through.
> No idea how it would provide a *consistent* API.
> Either way, if adding netdevs just for stats and link state, that could
> be easily added on top of swconfig (or whatever name we pick for it)
> later. I just don't think it's worth it at this point.
>

Ok, progress, lets leave this one out.

>> Can we call that "L3" instead of software bridge?
> L3? Why?

We have two L2 domains. You want to connect them - you need a higher
layer; Layer 3 seems to be the simple one (i.e typically people would
use ip to link two layer 2 broadcast domains).


> I think that's way more confusing to users than presenting a consistent
> model that properly reflects what you can do with the hardware.
>

I think discovery from a control view is always a win.

> But I sense a pattern here. I've long had my beef with quite a few Linux
> network related APIs for being inconsistent, having no decent error
> reporting when you're trying to configure things (errno doesn't count,
> it's just too ambiguous), and just making it hard to figure out the
> capabilities. Of course, none of this can be easily fixed due to ABI
> stability constraints.
> I do NOT wish to follow that pattern!
>

You are preaching to the choir. The whole errno 8 bit thing is a mess;
I used to printk things in the kernel to indicate granularity of
which EINVAL i was returning (but i was shot down); one suggestion is
to also include a string description on the error. But that is a side
issue.
So, nod. Discovery of capabilities is better - you still have to defer
to error codes when all else fails.


> I'm not going to try to enumerate all the case; I have other projects
> that I need to work on. :)
>

I understand. I am busy as well, just saying if we need to reach an
agreement to either agree or disagree we need to capture the esoterics
of the different cases; as you can see i tried to enumerate some in
my previous email. In my case this would be useful to see, using current
mechanisms, that it can or cant be done or can be done with mods etc.

> Only a *tiny* part of the software bridge configuration model can be
> emulated, the rest does not fit and has to be handled through extensions
> or different APIs anyway. That's why I am convinced that it's a really
> bad model to try to make these switches fit into it.
>
> You gain a tiny advantage with writing scripts, but at the same time,
> the code gets more complex, the configuration interface gets more
> confusing, there are more nasty corner cases to take care of.
> Why do you insist on making so many things worse just for one tiny
> advantage? Where's the pragmatic cost/benefit tradeoff?
>

There is nothing wrong with making extensions if they make sense. My
problem so far in this discussion is i havent figured which will be bad
extensions you bring up. My approach is to list things and
then point out which one will require some witchcraft on top of
current interfaces. I am afraid I am still missing that part. Maybe
I have to go back and study your patch some more.

> Right, with most of the switches that we support, almost none of these
> things work in a way that can be integrated with the network stack.
>

Good to know. These are useful components for slightly higher end
switches.


> I'm not even sure what you mean when you say 'cpu port cannot be
> assumed'.

Meant for other devices which are dumb - lets move past this point.

> On pretty much all devices that we work with, one of the ports
> connects to a NIC in the CPU. It's just that the switch cannot be
> assumed to have special treatment for that CPU port. As far as it is
> concerned, it is just another port like the others.
>

Aha. I think i see a small terminology cross-talk. You refer to things
as NICs when i use the term netdev. So now i understand better what you
mean by rx handler (I intepreted earlier to mean something at the tap
level). Ok, so Felix, for the case where we have switches with cpu ports
that can tag incoming packets with ingress port ids - can we say the
NIC rx handler is reasonable to be used as a demux point for the
software version of the ports? I am not talking about the corner
cases.

>> - ive never seen table id, but i think this is another one; in which
>> case the number of table ids becomes something one needs to discover..
> Yes, and this is something that doesn't even map directly to something
> in the software bridge world.
>

It does - There is a single table per bridge on the software bridge
world. You need multiple bridges, one per id.

cheers,
jamal

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 1/4 net-next] net: phy: add Generic Netlink Ethernet switch configuration API
  2013-10-29 23:12                 ` Maxime Bizon
@ 2013-10-30 11:50                   ` Jamal Hadi Salim
  2013-10-30 11:58                     ` Felix Fietkau
  2013-10-30 14:28                     ` Maxime Bizon
  0 siblings, 2 replies; 41+ messages in thread
From: Jamal Hadi Salim @ 2013-10-30 11:50 UTC (permalink / raw)
  To: mbizon
  Cc: Felix Fietkau, Florian Fainelli, Neil Horman, John Fastabend,
	netdev, David Miller, Sascha Hauer, John Crispin, Jonas Gorski,
	Gary Thomas, Vlad Yasevich, Stephen Hemminger

On 10/29/13 19:12, Maxime Bizon wrote:

>
>  From a user POV, when you see a netdevice, you expect to be able to
> receive or send packets from/to it. The ability to read stats/link is
> only a secondary feature.
>

The important part is all the APIs stay consistent. I can use
same netlink calls. ifconfig works.
iproute2 works. People have written books on this stuff - we dont
have MCSE(Must Call Software Engineer) certification, but this is
as close as it gets. i.e the knowledge has been commoditized, even
my kid knows how to use these tools.

If i can get stats by doing ifconfig - that should provide illusion that
the netdevice is sending/receiving packets.

> Wireless subsystem moved away from using dummy/additional netdevices
> because it caused confusion.
>

This is a good arguement.
Can we hear a little more about this?

> multiqueue devices forced us to separate struct netdevice and struct
> netdev_queue, maybe it's time for more surgery :)
>

I think that would be a reasonable thing to do if it becomes necessary.

cheers,
jamal

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 1/4 net-next] net: phy: add Generic Netlink Ethernet switch configuration API
  2013-10-30 11:50                   ` Jamal Hadi Salim
@ 2013-10-30 11:58                     ` Felix Fietkau
  2013-10-30 14:28                     ` Maxime Bizon
  1 sibling, 0 replies; 41+ messages in thread
From: Felix Fietkau @ 2013-10-30 11:58 UTC (permalink / raw)
  To: Jamal Hadi Salim, mbizon
  Cc: Florian Fainelli, Neil Horman, John Fastabend, netdev,
	David Miller, Sascha Hauer, John Crispin, Jonas Gorski,
	Gary Thomas, Vlad Yasevich, Stephen Hemminger

On 2013-10-30 12:50, Jamal Hadi Salim wrote:
> On 10/29/13 19:12, Maxime Bizon wrote:
> 
>>
>>  From a user POV, when you see a netdevice, you expect to be able to
>> receive or send packets from/to it. The ability to read stats/link is
>> only a secondary feature.
>>
> 
> The important part is all the APIs stay consistent. I can use
> same netlink calls. ifconfig works.
> iproute2 works. People have written books on this stuff - we dont
> have MCSE(Must Call Software Engineer) certification, but this is
> as close as it gets. i.e the knowledge has been commoditized, even
> my kid knows how to use these tools.
> 
> If i can get stats by doing ifconfig - that should provide illusion that
> the netdevice is sending/receiving packets.
Pretty much all of the above have serious limitations when you're not
actually able to run the data path through the per-port netdevs.
You can't assign IP addresses to them. The network stack will probably
even attempt to assign IPv6 link-local addresses to these things,
causing even more confusion.
You can't add them to normal software bridges like other devices.
You can't use bonding. I could probably go on for a while.
There's a huge list of things that you simply cannot do with these
interfaces, and without knowing the details of the implementation, users
will be left clueless as to why that is.
I'd say that's a very serious violation of the principle of least surprise.
And knowing what the typical OpenWrt users do with their devices, I can
already forsee the bogus bug reports trickling in, if this is to be
implemented.

- Felix

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 1/4 net-next] net: phy: add Generic Netlink Ethernet switch configuration API
  2013-10-30 11:45                                   ` Jamal Hadi Salim
@ 2013-10-30 12:53                                     ` Felix Fietkau
  0 siblings, 0 replies; 41+ messages in thread
From: Felix Fietkau @ 2013-10-30 12:53 UTC (permalink / raw)
  To: Jamal Hadi Salim, Florian Fainelli, Neil Horman
  Cc: John Fastabend, netdev, David Miller, Sascha Hauer, John Crispin,
	Jonas Gorski, Gary Thomas, Vlad Yasevich, Stephen Hemminger

On 2013-10-30 12:45, Jamal Hadi Salim wrote:
> On 10/29/13 05:34, Felix Fietkau wrote:
>> On 2013-10-28 23:53, Jamal Hadi Salim wrote:
> 
>>
>> These are simple switches, why would they respond to ARP?
>> I suspect that you're attributing too much functionality to the switch
>> itself. Think of it as a device similar to the cheap unmanaged ones you
>> can buy in a shop and hook up to your machine via Ethernet.
>> Add to that some very limited VLAN grouping functionality, and you're
>> pretty close to the limits of what these switches can do.
>> They don't do ARP, IP or other things. They learn about MAC addresses
>> from incoming packets to build their forwarding path.
>> The CPU port in this case is whatever port on the switch that you plug
>> the cable of your machine into :)
> 
> Ok, got it - the only use for cpu for these things is to retrieve things
> like stats, link state, etc; can you even read the fdb?
Where supported, all you can typically read is a list of which MAC
address was discovered behind which port - if you're lucky. You usually
won't find VLAN information attached to that.
Often it simply isn't supported at all.

>> The FDB related abstraction that you're describing will not work with
>> the hardware that I'm talking about. Let's leave that one out of this
>> discussion.
> 
> sigh - ok. But you gotta help me understand why.
The hardware implementation of MAC address handling isn't even
consistent across chips from different vendors. Often you don't even get
things like the VLAN ID. Sometimes there's a global forwarding table,
sometimes you can have multiple tables and assign them to VLANs.

>>> Can we call that "L3" instead of software bridge?
>> L3? Why?
> 
> We have two L2 domains. You want to connect them - you need a higher
> layer; Layer 3 seems to be the simple one (i.e typically people would
> use ip to link two layer 2 broadcast domains).
If you connect two L2 domains through a bridge, I still consider that L2
- it's still on the same layer, just goes through more hops.

>> I think that's way more confusing to users than presenting a consistent
>> model that properly reflects what you can do with the hardware.
> I think discovery from a control view is always a win.
Yes, and swconfig handles the discovery part fairly well.

>> I'm not going to try to enumerate all the case; I have other projects
>> that I need to work on. :)
> 
> I understand. I am busy as well, just saying if we need to reach an
> agreement to either agree or disagree we need to capture the esoterics
> of the different cases; as you can see i tried to enumerate some in
> my previous email. In my case this would be useful to see, using current
> mechanisms, that it can or cant be done or can be done with mods etc.
At this point, I'm not sure if we will be able to reach an agreement. I
think I've shown over and over again that what you're proposing comes
with huge costs in terms of complexity and bloat, as demonstrated by the
fact that it adds so many corner cases that would have to be dealt with,
including many for which we haven't even the slightest idea of a good
solution.
Now, to make this a viable option, the benefits would have to be big and
significant enough to offset these costs.
The only real benefit you've pointed out so far is to be able to reuse
existing tools/APIs (but only with modifications, not as-is). I think
that's fairly small, when put in perspective with the hard problems that
this approach creates, both for users (hidden traps and surprises) and
for developers (implementation difficulties and incompatible abstractions).

>> Only a *tiny* part of the software bridge configuration model can be
>> emulated, the rest does not fit and has to be handled through extensions
>> or different APIs anyway. That's why I am convinced that it's a really
>> bad model to try to make these switches fit into it.
>>
>> You gain a tiny advantage with writing scripts, but at the same time,
>> the code gets more complex, the configuration interface gets more
>> confusing, there are more nasty corner cases to take care of.
>> Why do you insist on making so many things worse just for one tiny
>> advantage? Where's the pragmatic cost/benefit tradeoff?
>>
> 
> There is nothing wrong with making extensions if they make sense.
Yes, but if the basic abstraction doesn't make sense for the use case,
and it leads to too many corner cases, there's everything wrong with
trying to work around that through extensions.

> My problem so far in this discussion is i havent figured which will be bad
> extensions you bring up. My approach is to list things and
> then point out which one will require some witchcraft on top of
> current interfaces. I am afraid I am still missing that part. Maybe
> I have to go back and study your patch some more.
Sure, go ahead.

>> On pretty much all devices that we work with, one of the ports
>> connects to a NIC in the CPU. It's just that the switch cannot be
>> assumed to have special treatment for that CPU port. As far as it is
>> concerned, it is just another port like the others.
> 
> Aha. I think i see a small terminology cross-talk. You refer to things
> as NICs when i use the term netdev. So now i understand better what you
> mean by rx handler (I intepreted earlier to mean something at the tap
> level). 
I only started using the term NIC to emphasize that it's not just a
netdev of the switch - it's a real Ethernet MAC (usually in the SoC),
with a separate driver that knows nothing about the switch.

> Ok, so Felix, for the case where we have switches with cpu ports
> that can tag incoming packets with ingress port ids - can we say the
> NIC rx handler is reasonable to be used as a demux point for the
> software version of the ports? I am not talking about the corner
> cases.
Yes, but when looking at the big picture, the switch being able to tag
incoming packets with the ingress port is a corner case!
Most switches that we work with aren't actually able to do that!
I want to have a decent baseline implementation that does not assume
this port tagging capability.

>>> - ive never seen table id, but i think this is another one; in which
>>> case the number of table ids becomes something one needs to discover..
>> Yes, and this is something that doesn't even map directly to something
>> in the software bridge world.
> It does - There is a single table per bridge on the software bridge
> world. You need multiple bridges, one per id.
Depends on which software bridge.
If I have two normal netdevs, eth0 and eth1, I can create eth0.4 and
bridge it to eth1.5. That's just one bridge.
I can't easily emulate that with fake per-port netdevs and a typical
switch supported by swconfig.
With just swconfig (no fake netdevs) switches that support these table
ids, I would need to have two VLANs in the switch (both connected to the
CPU port, each one getting a separate table id), and then one software
bridge between eth0.4 and eth0.5 (assuming eth0 connects to the switch).

- Felix

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 1/4 net-next] net: phy: add Generic Netlink Ethernet switch configuration API
  2013-10-30 11:50                   ` Jamal Hadi Salim
  2013-10-30 11:58                     ` Felix Fietkau
@ 2013-10-30 14:28                     ` Maxime Bizon
  1 sibling, 0 replies; 41+ messages in thread
From: Maxime Bizon @ 2013-10-30 14:28 UTC (permalink / raw)
  To: Jamal Hadi Salim
  Cc: Felix Fietkau, Florian Fainelli, Neil Horman, John Fastabend,
	netdev, David Miller, Sascha Hauer, John Crispin, Jonas Gorski,
	Gary Thomas, Vlad Yasevich, Stephen Hemminger


On Wed, 2013-10-30 at 07:50 -0400, Jamal Hadi Salim wrote:

> The important part is all the APIs stay consistent. I can use
> same netlink calls. ifconfig works.
> iproute2 works. People have written books on this stuff - we dont

these books usually start by telling people to assign IP address to
interfaces, not applicable here.

> If i can get stats by doing ifconfig - that should provide illusion that
> the netdevice is sending/receiving packets.

4 separated netdevices looks like 4 ethernet segments to me, and nothing
will prevent me from setting a different ip network on each device.

ENOTSUPP cannot be returned by ndo_start_xmit, the ability for a
netdevice to be able to receive/send packet from host is IMO
fundamental.

> This is a good arguement.
> Can we hear a little more about this?

see this kind of old threads:

http://rt2x00.serialmonkey.com/phpBB/viewtopic.php?f=5&t=4378
http://www.linuxquestions.org/questions/linux-networking-3/what-is-wmaster0-728708/
http://forums.debian.net/viewtopic.php?p=219440

> I think that would be a reasonable thing to do if it becomes
> necessary.

with rough naming:

- struct netdevice
- struct netdev_queue
- struct network_port (something to call ethtool on)
- struct bridge_dev (something you create/destroy vlan on, control FDB)
- struct bridge_port (something you set path cost on, ...)
- struct sw_bridge_dev (netdevice + underlying bridge_dev)
- struct sw_bridge_port (netdevice + underlying bridge_port)

old netdevice => (netdevice + netdev_queue * x + network_port)

ethtool works on netdevice or network_port

brctl addbr/addif creates sw_bridge_dev/sw_bridge_port, other commands
work on bridge_dev/bridge_port

drivers can register bridge_dev / bridge_port / network_port

simple case of a system with single ethernet mac & directly attached 4
ports switch:

netdevice: eth0
bridge_dev: hwbr0
bridge_port: hwbr0p0, hwbr0p1, hwbr0p2, hwbr0p3
network ports: eth0np0, hwbr0np0, hwbr0np1, hwbr0np2, hwbr0np3

ifconfig, ip link show only eth0
brctl show hwbr0
ethtool works on eth0 or eth0p0, hwbr0npX

-- 
Maxime

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 1/4 net-next] net: phy: add Generic Netlink Ethernet switch configuration API
  2013-10-28 22:53                               ` Jamal Hadi Salim
  2013-10-29  9:34                                 ` Felix Fietkau
@ 2013-10-30 17:27                                 ` Lennert Buytenhek
  2013-10-30 17:34                                   ` Lennert Buytenhek
  2013-10-30 19:47                                   ` Felix Fietkau
  1 sibling, 2 replies; 41+ messages in thread
From: Lennert Buytenhek @ 2013-10-30 17:27 UTC (permalink / raw)
  To: Jamal Hadi Salim, Felix Fietkau
  Cc: Florian Fainelli, Neil Horman, John Fastabend, netdev,
	David Miller, Sascha Hauer, John Crispin, Jonas Gorski,
	Gary Thomas, Vlad Yasevich, Stephen Hemminger, Chris Healy

I didn't follow the rest of this thread, but..


On Mon, Oct 28, 2013 at 06:53:29PM -0400, Jamal Hadi Salim wrote:

> >That question does not make any sense to me. Aside from low level
> >control frames like pause frames for flow control, the switch has no
> >need to send packets to the CPU port on its own.

..a lot of people want to be able to do Spanning Tree, LLDP, 802.1x,
you name it, on their routers and access points, and that requires
that your CPU can send/receive packets to/from individual ports on
your switch chip.  In a lot of markets, your product is a non-starter
if it can't provide any or all of the above.  Excluding this entire
class of use cases _by software design_ is somewhat myopic and stupid.

(It's a different thing if your switch chip is dumb and can't actually
address individual ports, but then there's still no reason to impose
the same restrictions on your software design.)


> >DSA does this, and last time I looked, it pushes *all* bridge traffic
> >through the CPU, making it completely unusable for slower embedded CPUs.
>
> [...]
>
> >If I remember correctly, adding support 'bridge acceleration' was left
> >as an exercise for the reader and never actually implemented.

This patch does exactly that:

	http://patchwork.ozlabs.org/patch/16578/

This patch is in production use in a couple of million DSL gateways,
as well as in a bunch of airplane in-flight entertainment systems, so
by all means I would say that it works rather well.

If there is renewed interest in having such functionality upstream,
I would be happy to update the patch and submit it for inclusion.


> >Sure, this could be fixed somehow, but even then the model and
> >assumptions that DSA is built on simply don't work for some of the
> >dumber switches that we support.

What model and assumptions would those be?


> >One of the currently very common switches in many embedded devices is
> >the RTL8366/RTL8367. It has some flexibility when it comes to
> >configuring VLANs, and it's one of the few ones where you can configure
> >a forwarding table for a VLAN (which spans multiple ports), which allows
> >software bridging between multiple VLANs.
> >However, what this switch does *not* support is adding a header/trailer
> >to packets to indicate the originating port.

The ingress/egress port doesn't _have_ to be conveyed in the data
packets themselves.

>From a quick look at the RTL8366 datasheet, you can control the
egress port by creating a temporary MAC address table entry (this
seems to work both for unicast and multicast packets).

Admittedly, I didn't have a very thorough look at the datasheet,
but it also mentions the Spanning Tree protocol, and contains this
remark related to receiving BPDUs: "The CPU port should carry the
ingress port number of the receiving BPDU.".  If this switch chip
can't do per-port addressing, how can it actually ever speak STP
at all?  Is the datasheet just lying about this?


> >This means that all per-port netdevs will be dummy ports which don't
> >include the data path.

And I think that's fine.

Look, even if you're not going to address data traffic to individual
ports on your switch chip, there's still a plethora of per-port
operations that you want to be able to do: administratively setting
the link state on ports up and down, controlling autonegotiation and
other PHY settings on individual ports, etc.

You can either let the administrator do this with the standard ifconfig
/ ip link / ethtool tools, or you can make up a parallel API and
corresponding set of userland tools to duplicate most of the existing
functionality -- I know which option I prefer.

Presenting each switch port as an individual Linux netdevice to the OS
is an orthogonal decision to actually using those netdevices for data
traffic, and conflating the two by arguing that you need special tools
to do per-port operations for the sole reason that your switch chip
cannot address individual ports is a rather confused argument.


> My view is that netdevs are still valuable even if only they get
> used for control path. Like you said earlier - you can still pull
> stats, flow control messages still make it through etc. They provide
> you
> the consistent api to configure the switch above, ex:
> If i was to use the FDB api for this switch as long as i can
> abstract it in software as a bridge, I could send it a switch config
> via its ops which says:
> "I am giving you this entry with vland 400 for port 2, but i want you to
> send it to the hardware not to your local entry"

Fully agreed on this.


> >So let's say you have a configuration where you're using VLAN ID 4 on
> >port 1, and you want to bridge it to VLAN ID 400 on port 2.
> >
> >Sounds easy enough, you can easily create a bridge that spans port1.4
> >and port2.400. Except, this particular switch (like pretty much any
> >other switch supported by swconfig) isn't actually able to handle such a
> >configuration on its own.

Neither can DSA switch chips.

You can always find things that Linux can do that your switch chip
cannot (e.g. stateful firewalling between ports), and that isn't much
of an argument for or against anything.


> >With swconfig, you create two VLANs: VLAN 4, containing CPU and port1;
> >VLAN 400, containing CPU and port2. You then create a software bridge
> >between eth0.4 and eth0.400 (assuming eth0 is the NIC connected to the
> >switch).

With DSA, you would bridge between port1.4 and port2.400.  I'm still
not sure what your argument is arguing for or against.


> >In a different scenario, the code would also have to detect
> >configurations that the switch isn't able to handle, e.g.: bridging
> >port1.4 to eth1 and port2.4 to eth2.
> >Such a configuration wouldn't work at all with such a switch, because
> >the CPU isn't able to tell apart traffic from port1 and port2, and
> >there's no way to tell the switch that port1.4 and port2.4 should not be
> >connected to each other, but both should go to the CPU.

And it's quite easy to detect what your switch chip can do and offload
that part to the hardware, and keep doing the rest in software.


> >Trying to make all of these cases work in the code will make the whole
> >thing a lot more difficult to deal with and maintain. It will also make
> >it much harder for the user to figure out, what configurations work, and
> >what configurations don't.

It's actually quite easy, and certainly a lot less total effort than
forcing all of your users to learn a new set of userland tools (unless
you're not aiming to ever have a lot of users, that is..).


thanks,
Lennert

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 1/4 net-next] net: phy: add Generic Netlink Ethernet switch configuration API
  2013-10-30 17:27                                 ` Lennert Buytenhek
@ 2013-10-30 17:34                                   ` Lennert Buytenhek
  2013-10-30 17:56                                     ` John Fastabend
  2013-10-30 17:56                                     ` John Fastabend
  2013-10-30 19:47                                   ` Felix Fietkau
  1 sibling, 2 replies; 41+ messages in thread
From: Lennert Buytenhek @ 2013-10-30 17:34 UTC (permalink / raw)
  To: Jamal Hadi Salim, Felix Fietkau
  Cc: Florian Fainelli, Neil Horman, John Fastabend, netdev,
	David Miller, Sascha Hauer, John Crispin, Jonas Gorski,
	Gary Thomas, Vlad Yasevich, Stephen Hemminger, Chris Healy

On Wed, Oct 30, 2013 at 06:27:56PM +0100, Lennert Buytenhek wrote:

> > >This means that all per-port netdevs will be dummy ports which don't
> > >include the data path.
> 
> And I think that's fine.
> 
> Look, even if you're not going to address data traffic to individual
> ports on your switch chip, there's still a plethora of per-port
> operations that you want to be able to do: administratively setting
> the link state on ports up and down, controlling autonegotiation and
> other PHY settings on individual ports, etc.
> 
> You can either let the administrator do this with the standard ifconfig
> / ip link / ethtool tools, or you can make up a parallel API and
> corresponding set of userland tools to duplicate most of the existing
> functionality -- I know which option I prefer.
> 
> Presenting each switch port as an individual Linux netdevice to the OS
> is an orthogonal decision to actually using those netdevices for data
> traffic, and conflating the two by arguing that you need special tools
> to do per-port operations for the sole reason that your switch chip
> cannot address individual ports is a rather confused argument.

Forgot to add: there's a patch for net/dsa that adds exactly such an
option.  We called it 'unmanaged' mode, and it doesn't enable packet
tagging on the CPU<->switch chip interface, so that data only ever
flows over a single network interface ("eth0"), while the other
("dummy") network interfaces ("port1", "port2", etc) are used for
setting link state with ip link, setting PHY settings with ethtool,
getting ethtool statistics, etc, with 100% unmodified userland tools.
This patch is currently buried inside a vendor tree, but I'd be happy
to dig it out and submit it.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 1/4 net-next] net: phy: add Generic Netlink Ethernet switch configuration API
  2013-10-30 17:34                                   ` Lennert Buytenhek
@ 2013-10-30 17:56                                     ` John Fastabend
  2013-10-30 17:56                                     ` John Fastabend
  1 sibling, 0 replies; 41+ messages in thread
From: John Fastabend @ 2013-10-30 17:56 UTC (permalink / raw)
  To: Lennert Buytenhek
  Cc: Jamal Hadi Salim, Felix Fietkau, Florian Fainelli, Neil Horman,
	netdev, David Miller, Sascha Hauer, John Crispin, Jonas Gorski,
	Gary Thomas, Vlad Yasevich, Stephen Hemminger, Chris Healy

On 10/30/2013 10:34 AM, Lennert Buytenhek wrote:
> On Wed, Oct 30, 2013 at 06:27:56PM +0100, Lennert Buytenhek wrote:
>
>>>> This means that all per-port netdevs will be dummy ports which don't
>>>> include the data path.
>>
>> And I think that's fine.
>>
>> Look, even if you're not going to address data traffic to individual
>> ports on your switch chip, there's still a plethora of per-port
>> operations that you want to be able to do: administratively setting
>> the link state on ports up and down, controlling autonegotiation and
>> other PHY settings on individual ports, etc.
>>
>> You can either let the administrator do this with the standard ifconfig
>> / ip link / ethtool tools, or you can make up a parallel API and
>> corresponding set of userland tools to duplicate most of the existing
>> functionality -- I know which option I prefer.
>>
>> Presenting each switch port as an individual Linux netdevice to the OS
>> is an orthogonal decision to actually using those netdevices for data
>> traffic, and conflating the two by arguing that you need special tools
>> to do per-port operations for the sole reason that your switch chip
>> cannot address individual ports is a rather confused argument.
>
> Forgot to add: there's a patch for net/dsa that adds exactly such an
> option.  We called it 'unmanaged' mode, and it doesn't enable packet
> tagging on the CPU<->switch chip interface, so that data only ever
> flows over a single network interface ("eth0"), while the other
> ("dummy") network interfaces ("port1", "port2", etc) are used for
> setting link state with ip link, setting PHY settings with ethtool,
> getting ethtool statistics, etc, with 100% unmodified userland tools.
> This patch is currently buried inside a vendor tree, but I'd be happy
> to dig it out and submit it.
>

A "dummy" network interface is something I've been thinking about
for SR-IOV as well. In the SR-IOV case we have an embedded bridge
in the hardware but the virtual functions may be direct assigned
to a guest and not visible to the host.

It would be easier to manage the ports and assign them to different
bridge/QOS objects (OVS, bridge, nftables) if the ports were visible
and manageable in the host even though there is no data path. Today
we special ndo ops that only work for VFs but this is a bit clumsy
and gets more clumsy as the nic switch becomes more like a real switch.

.John

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 1/4 net-next] net: phy: add Generic Netlink Ethernet switch configuration API
  2013-10-30 17:34                                   ` Lennert Buytenhek
  2013-10-30 17:56                                     ` John Fastabend
@ 2013-10-30 17:56                                     ` John Fastabend
  1 sibling, 0 replies; 41+ messages in thread
From: John Fastabend @ 2013-10-30 17:56 UTC (permalink / raw)
  To: Lennert Buytenhek
  Cc: Jamal Hadi Salim, Felix Fietkau, Florian Fainelli, Neil Horman,
	netdev, David Miller, Sascha Hauer, John Crispin, Jonas Gorski,
	Gary Thomas, Vlad Yasevich, Stephen Hemminger, Chris Healy

On 10/30/2013 10:34 AM, Lennert Buytenhek wrote:
> On Wed, Oct 30, 2013 at 06:27:56PM +0100, Lennert Buytenhek wrote:
>
>>>> This means that all per-port netdevs will be dummy ports which don't
>>>> include the data path.
>>
>> And I think that's fine.
>>
>> Look, even if you're not going to address data traffic to individual
>> ports on your switch chip, there's still a plethora of per-port
>> operations that you want to be able to do: administratively setting
>> the link state on ports up and down, controlling autonegotiation and
>> other PHY settings on individual ports, etc.
>>
>> You can either let the administrator do this with the standard ifconfig
>> / ip link / ethtool tools, or you can make up a parallel API and
>> corresponding set of userland tools to duplicate most of the existing
>> functionality -- I know which option I prefer.
>>
>> Presenting each switch port as an individual Linux netdevice to the OS
>> is an orthogonal decision to actually using those netdevices for data
>> traffic, and conflating the two by arguing that you need special tools
>> to do per-port operations for the sole reason that your switch chip
>> cannot address individual ports is a rather confused argument.
>
> Forgot to add: there's a patch for net/dsa that adds exactly such an
> option.  We called it 'unmanaged' mode, and it doesn't enable packet
> tagging on the CPU<->switch chip interface, so that data only ever
> flows over a single network interface ("eth0"), while the other
> ("dummy") network interfaces ("port1", "port2", etc) are used for
> setting link state with ip link, setting PHY settings with ethtool,
> getting ethtool statistics, etc, with 100% unmodified userland tools.
> This patch is currently buried inside a vendor tree, but I'd be happy
> to dig it out and submit it.
>

A "dummy" network interface is something I've been thinking about
for SR-IOV nics as well. In the SR-IOV case we have an embedded bridge
in the hardware but the virtual functions may be direct assigned
to a guest and not visible to the host.

It would be easier to manage the ports and assign them to different
bridge/QOS objects (OVS, bridge, nftables) if the ports were visible
and manageable in the host even though there is no data path. Today
we special ndo ops that only work for VFs but this is a bit clumsy
and gets more clumsy as the nic switch becomes more like a real switch.

.John

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 1/4 net-next] net: phy: add Generic Netlink Ethernet switch configuration API
  2013-10-30 17:27                                 ` Lennert Buytenhek
  2013-10-30 17:34                                   ` Lennert Buytenhek
@ 2013-10-30 19:47                                   ` Felix Fietkau
  2013-12-07  1:45                                     ` Florian Fainelli
  1 sibling, 1 reply; 41+ messages in thread
From: Felix Fietkau @ 2013-10-30 19:47 UTC (permalink / raw)
  To: Lennert Buytenhek, Jamal Hadi Salim
  Cc: Florian Fainelli, Neil Horman, John Fastabend, netdev,
	David Miller, Sascha Hauer, John Crispin, Jonas Gorski,
	Gary Thomas, Vlad Yasevich, Stephen Hemminger, Chris Healy

On 2013-10-30 18:27, Lennert Buytenhek wrote:
> I didn't follow the rest of this thread, but..
> 
> 
> On Mon, Oct 28, 2013 at 06:53:29PM -0400, Jamal Hadi Salim wrote:
> 
>> >That question does not make any sense to me. Aside from low level
>> >control frames like pause frames for flow control, the switch has no
>> >need to send packets to the CPU port on its own.
> 
> ..a lot of people want to be able to do Spanning Tree, LLDP, 802.1x,
> you name it, on their routers and access points, and that requires
> that your CPU can send/receive packets to/from individual ports on
> your switch chip.  In a lot of markets, your product is a non-starter
> if it can't provide any or all of the above.  Excluding this entire
> class of use cases _by software design_ is somewhat myopic and stupid.
> 
> (It's a different thing if your switch chip is dumb and can't actually
> address individual ports, but then there's still no reason to impose
> the same restrictions on your software design.)
Many of the switches we support can't address individual ports via tags.
You usually just set up VLANs and let the switch do the rest.

>> >DSA does this, and last time I looked, it pushes *all* bridge traffic
>> >through the CPU, making it completely unusable for slower embedded CPUs.
>>
>> [...]
>>
>> >If I remember correctly, adding support 'bridge acceleration' was left
>> >as an exercise for the reader and never actually implemented.
> 
> This patch does exactly that:
> 
> 	http://patchwork.ozlabs.org/patch/16578/
> 
> This patch is in production use in a couple of million DSL gateways,
> as well as in a bunch of airplane in-flight entertainment systems, so
> by all means I would say that it works rather well.
> 
> If there is renewed interest in having such functionality upstream,
> I would be happy to update the patch and submit it for inclusion.
Yes, I would really like to see this merged. If we can somehow get the
bridge offload stuff to handle VLAN trunking as well, I'd be interested
in looking into DSA support for some Atheros switches that I've been
working with.

>> >Sure, this could be fixed somehow, but even then the model and
>> >assumptions that DSA is built on simply don't work for some of the
>> >dumber switches that we support.
> 
> What model and assumptions would those be?
The assumption of being able to address individual ports via tags.

>> >One of the currently very common switches in many embedded devices is
>> >the RTL8366/RTL8367. It has some flexibility when it comes to
>> >configuring VLANs, and it's one of the few ones where you can configure
>> >a forwarding table for a VLAN (which spans multiple ports), which allows
>> >software bridging between multiple VLANs.
>> >However, what this switch does *not* support is adding a header/trailer
>> >to packets to indicate the originating port.
> 
> The ingress/egress port doesn't _have_ to be conveyed in the data
> packets themselves.
> 
> From a quick look at the RTL8366 datasheet, you can control the
> egress port by creating a temporary MAC address table entry (this
> seems to work both for unicast and multicast packets).
Sounds nasty. I certainly wouldn't want this called from the data path,
since on some systems register access will be bit-banged over GPIO.

> Admittedly, I didn't have a very thorough look at the datasheet,
> but it also mentions the Spanning Tree protocol, and contains this
> remark related to receiving BPDUs: "The CPU port should carry the
> ingress port number of the receiving BPDU.".  If this switch chip
> can't do per-port addressing, how can it actually ever speak STP
> at all?  Is the datasheet just lying about this?
>From what the datasheet, it looks like it expects the CPU to guess the
port based on the VLAN ID. This is RealTek, so it might just be
theoretically possible to do what the datasheet says, but quirky enough
to be unusable in practice.

>> >This means that all per-port netdevs will be dummy ports which don't
>> >include the data path.
> 
> And I think that's fine.
> 
> Look, even if you're not going to address data traffic to individual
> ports on your switch chip, there's still a plethora of per-port
> operations that you want to be able to do: administratively setting
> the link state on ports up and down, controlling autonegotiation and
> other PHY settings on individual ports, etc.
> 
> You can either let the administrator do this with the standard ifconfig
> / ip link / ethtool tools, or you can make up a parallel API and
> corresponding set of userland tools to duplicate most of the existing
> functionality -- I know which option I prefer.
> 
> Presenting each switch port as an individual Linux netdevice to the OS
> is an orthogonal decision to actually using those netdevices for data
> traffic, and conflating the two by arguing that you need special tools
> to do per-port operations for the sole reason that your switch chip
> cannot address individual ports is a rather confused argument.
The thing that most swconfig users in OpenWrt care about is being able
to group ports into VLANs, sometimes just to be able to split them into
LAN/WAN, sometimes to be able to use one port as a trunking port to
connect multiple networks (some of which may be on ports of the same
switch, some behind the CPU port).
I care about that part a lot more than messing around with the
individual ports.
If we can figure out a clean and simple way to support this well, even
on switches that are seriously limited wrt. individual port addressing
via the data path, I'd be more willing to consider it.
I still don't like the dummy netdev thing very much, because I know
enough users that will easily get confused by this, and with a separate
interface they at least know that there's a separate set of rules to it.
I don't think that's a confused argument.

>> >With swconfig, you create two VLANs: VLAN 4, containing CPU and port1;
>> >VLAN 400, containing CPU and port2. You then create a software bridge
>> >between eth0.4 and eth0.400 (assuming eth0 is the NIC connected to the
>> >switch).
> 
> With DSA, you would bridge between port1.4 and port2.400.  I'm still
> not sure what your argument is arguing for or against.
I'm saying most switches that we support cannot do DSA-style packet port
tagging for ingress/egress. That kind of setup can be done with some
software bridging when setting up VLAN tables appropriately, but I'm not
sure it's possible to emulate this if you're treating the switch as a
'bridge' and trying to do handle this via the FDB API, which is what we
were discussing earlier.

- Felix

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 1/4 net-next] net: phy: add Generic Netlink Ethernet switch configuration API
  2013-10-30 19:47                                   ` Felix Fietkau
@ 2013-12-07  1:45                                     ` Florian Fainelli
  0 siblings, 0 replies; 41+ messages in thread
From: Florian Fainelli @ 2013-12-07  1:45 UTC (permalink / raw)
  To: Felix Fietkau
  Cc: Lennert Buytenhek, Jamal Hadi Salim, Neil Horman, John Fastabend,
	netdev, David Miller, Sascha Hauer, John Crispin, Jonas Gorski,
	Gary Thomas, Vlad Yasevich, Stephen Hemminger, Chris Healy

2013/10/30 Felix Fietkau <nbd@openwrt.org>:
>> With DSA, you would bridge between port1.4 and port2.400.  I'm still
>> not sure what your argument is arguing for or against.
> I'm saying most switches that we support cannot do DSA-style packet port
> tagging for ingress/egress. That kind of setup can be done with some
> software bridging when setting up VLAN tables appropriately, but I'm not
> sure it's possible to emulate this if you're treating the switch as a
> 'bridge' and trying to do handle this via the FDB API, which is what we
> were discussing earlier.

DSA works nicely because it has its own Ethertype, and also because
the switches are configurable enough to allow for the insertion of a
custom Ethertype for which you can register a corresponding
dev_add_pack() handler. Even on some switches which support custom
tags appended or prepended such as those from Broadcom, a tag is
usually inserted between ethe Ethernet Source Address and the
Ethertype, but this tag has variable fields. I could imagine that we
extend the dev_add_pack() to include a mask to perform the matching,
but this is going to add an "and" logical operation in a hot-path,
would that be acceptable?

And this only works when the tag is prepended, if it is appended, we
are kind of stuck because we do not really have any way to look for it
at the end of the frame but assume that is here/not here...
-- 
Florian

^ permalink raw reply	[flat|nested] 41+ messages in thread

end of thread, other threads:[~2013-12-07  1:46 UTC | newest]

Thread overview: 41+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-10-22 18:23 [PATCH 0/4 net-next] net: phy: add Generic Netlink switch configuration API Florian Fainelli
2013-10-22 18:23 ` [PATCH 1/4 net-next] net: phy: add Generic Netlink Ethernet " Florian Fainelli
2013-10-22 19:22   ` Dan Williams
2013-10-22 19:32     ` Florian Fainelli
2013-10-22 19:47       ` David Miller
     [not found]         ` <1382477150.19269.69.camel@dcbw.foobar.com>
2013-10-22 21:22           ` David Miller
2013-10-22 19:46     ` David Miller
2013-10-22 19:53   ` John Fastabend
2013-10-22 19:59     ` Florian Fainelli
2013-10-22 20:25       ` Neil Horman
2013-10-22 22:09         ` Florian Fainelli
2013-10-23 11:34           ` Neil Horman
2013-10-23 11:47           ` Jamal Hadi Salim
2013-10-23 12:04             ` Felix Fietkau
2013-10-23 12:53               ` Jamal Hadi Salim
2013-10-23 13:31                 ` Felix Fietkau
2013-10-23 14:09                   ` Jamal Hadi Salim
2013-10-23 14:32                     ` Felix Fietkau
2013-10-25 11:43                       ` Jamal Hadi Salim
2013-10-25 13:01                         ` Felix Fietkau
2013-10-27 17:19                           ` Jamal Hadi Salim
2013-10-27 18:14                             ` Florian Fainelli
2013-10-28 22:29                               ` Jamal Hadi Salim
2013-10-27 19:51                             ` Felix Fietkau
2013-10-28 22:53                               ` Jamal Hadi Salim
2013-10-29  9:34                                 ` Felix Fietkau
2013-10-30 11:45                                   ` Jamal Hadi Salim
2013-10-30 12:53                                     ` Felix Fietkau
2013-10-30 17:27                                 ` Lennert Buytenhek
2013-10-30 17:34                                   ` Lennert Buytenhek
2013-10-30 17:56                                     ` John Fastabend
2013-10-30 17:56                                     ` John Fastabend
2013-10-30 19:47                                   ` Felix Fietkau
2013-12-07  1:45                                     ` Florian Fainelli
2013-10-29 23:12                 ` Maxime Bizon
2013-10-30 11:50                   ` Jamal Hadi Salim
2013-10-30 11:58                     ` Felix Fietkau
2013-10-30 14:28                     ` Maxime Bizon
2013-10-22 18:23 ` [PATCH 2/4 net-next] tools: add Generic Netlink switch configuration tool Florian Fainelli
2013-10-22 18:23 ` [PATCH 3/4 net-next] net: phy: add Broadcom B53 switch driver Florian Fainelli
2013-10-22 18:23 ` [PATCH 4/4 net-next] net: phy: add fake " Florian Fainelli

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.