All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH RFC net-next 0/6] net: reducing memory footprint of network devices
@ 2017-05-06 16:07 David Ahern
  2017-05-06 16:07 ` [PATCH RFC net-next 1/6] net: Add accessor for kboject in a net_device David Ahern
                   ` (6 more replies)
  0 siblings, 7 replies; 17+ messages in thread
From: David Ahern @ 2017-05-06 16:07 UTC (permalink / raw)
  To: netdev; +Cc: roopa, f.fainelli, nicolas.dichtel, David Ahern

As I have mentioned many times[1], at ~43+kB per instance the use of
net_devices does not scale for deployments needing 10,000+ devices. At
netconf 1.2 there was a discussion about using a net_device_common for
the minimal set of common attributes with other structs built on top of
that one for "full" devices. It provided a means for the code to know
"non-standard" net_devices. Conceptually, that approach has its merits
but it is not practical given the sweeping changes required to the code
base. More importantly though struct net_device is not the problem; it
weighs in at less than 2kB so reorganizing the code base around a
refactored net_device is not going to solve the problem. The primary
issue is all of the initializations done *because* it is a struct
net_device -- kobject and sysfs and the protocols (e.g., ipv4, ipv6,
mpls, neighbors).

So, how do you keep the desired attributes of a net device -- network
addresses, xmit function, qdisc, netfilter rules, tcpdump -- while
lowering the overhead of a net_device instance and without sweeping
changes across net/ and drivers/net/?

This patch set introduces the concept of labeling net_devices as
"lightweight", first mentioned at netdev 1.1 [1]. Users have to opt
in to lightweight devices by passing a new attribute, IFLA_LWT_NETDEV,
in the new link request. This lightweight tag is meant for virtual
devices such as vlan, vrf, vti, and dummy where the user expects to
create a lot of them and does not want the duplication of resources.
Each device type can always opt out of a lightweight label if necessary
by failing device creates.

Labeling a virtual device as "lightweight" reduces the footprint for
device creation from ~43kB to ~6kB. That reduction in memory is obtained
by:
1. no entry in sysfs
   - kobject in net_device.device is not initialized

2. no entry in procfs
   - no sysctl option for these devices

3. deferred ipv4, ipv6, mpls initialization
   - network layer must be enabled before an address can be assigned
     or mpls labels can be processed
   - enables what Florian called L2 only devices [2]

Once the core premise of a lightweight device is accepted, follow on
patches can reduce the overhead of network initializations. e.g.,

1. remove devconf per device (ipv4 and ipv6)
   - lightweight devices use the default settings rather than replicate
     the same data for each device

2. reduce / remove / opt out of snmp mibs
   - snmp6_alloc_dev and icmpv6msg_mib_device specifically is a heavy
     hitter

Patches can also be found here:
    https://github.com/dsahern/linux lwt-dev-rfc

And iproute2 here:
    https://github.com/dsahern/iproute2 lwt-dev

Example:
    ip li add foo lwd type vrf table 123

- creates VRF device 'foo' as a lightweight netdevice.


[1] http://www.netdevconf.org/1.1/proceedings/slides/ahern-aleksandrov-prabhu-scaling-network-cumulus.pdf
[2] https://www.spinics.net/lists/netdev/msg340808.html
David Ahern (6):
  net: Add accessor for kboject in a net_device
  net: Add flags argument to alloc_netdev_mqs
  net: Introduce IFF_LWT_NETDEV flag
  net: Do not intialize kobject for lightweight netdevs
  net: Delay initializations for lightweight devices
  net: add uapi for creating lightweight devices

 drivers/net/ethernet/mellanox/mlx5/core/ipoib.c |  2 +-
 drivers/net/ethernet/tile/tilegx.c              |  2 +-
 drivers/net/tun.c                               |  2 +-
 drivers/net/wireless/marvell/mwifiex/cfg80211.c |  2 +-
 include/linux/netdevice.h                       | 27 ++++++++--
 include/uapi/linux/if_link.h                    |  1 +
 net/batman-adv/sysfs.c                          | 13 ++++-
 net/bridge/br_if.c                              | 12 +++--
 net/bridge/br_sysfs_br.c                        | 17 +++---
 net/bridge/br_sysfs_if.c                        |  8 ++-
 net/core/dev.c                                  | 71 ++++++++++++++++++-------
 net/core/neighbour.c                            |  3 ++
 net/core/net-sysfs.c                            | 25 ++++++---
 net/core/rtnetlink.c                            | 10 +++-
 net/ethernet/eth.c                              |  2 +-
 net/ipv4/devinet.c                              | 18 ++++++-
 net/ipv6/addrconf.c                             |  9 ++++
 net/mac80211/iface.c                            |  2 +-
 net/mpls/af_mpls.c                              |  6 +++
 net/wireless/core.c                             | 15 ++++--
 20 files changed, 190 insertions(+), 57 deletions(-)

-- 
2.11.0 (Apple Git-81)

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH RFC net-next 1/6] net: Add accessor for kboject in a net_device
  2017-05-06 16:07 [PATCH RFC net-next 0/6] net: reducing memory footprint of network devices David Ahern
@ 2017-05-06 16:07 ` David Ahern
  2017-05-06 16:07 ` [PATCH RFC net-next 2/6] net: Add flags argument to alloc_netdev_mqs David Ahern
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 17+ messages in thread
From: David Ahern @ 2017-05-06 16:07 UTC (permalink / raw)
  To: netdev; +Cc: roopa, f.fainelli, nicolas.dichtel, David Ahern

Signed-off-by: David Ahern <dsahern@gmail.com>
---
 include/linux/netdevice.h |  5 +++++
 net/batman-adv/sysfs.c    | 13 +++++++++--
 net/bridge/br_if.c        | 12 ++++++----
 net/bridge/br_sysfs_br.c  | 17 +++++++++-----
 net/bridge/br_sysfs_if.c  |  8 +++++--
 net/core/dev.c            | 57 ++++++++++++++++++++++++++++++++++-------------
 net/core/net-sysfs.c      | 11 +++++----
 net/wireless/core.c       | 15 +++++++++----
 8 files changed, 100 insertions(+), 38 deletions(-)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 9c23bd2efb56..305d2d42b349 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -4272,6 +4272,11 @@ static inline const char *netdev_reg_state(const struct net_device *dev)
 	return " (unknown)";
 }
 
+static inline struct kobject *netdev_kobject(struct net_device *dev)
+{
+	return &dev->dev.kobj;
+}
+
 __printf(3, 4)
 void netdev_printk(const char *level, const struct net_device *dev,
 		   const char *format, ...);
diff --git a/net/batman-adv/sysfs.c b/net/batman-adv/sysfs.c
index 0ae8b30e4eaa..a8a7294fc054 100644
--- a/net/batman-adv/sysfs.c
+++ b/net/batman-adv/sysfs.c
@@ -735,11 +735,14 @@ static struct batadv_attribute *batadv_vlan_attrs[] = {
 
 int batadv_sysfs_add_meshif(struct net_device *dev)
 {
-	struct kobject *batif_kobject = &dev->dev.kobj;
+	struct kobject *batif_kobject = netdev_kobject(dev);
 	struct batadv_priv *bat_priv = netdev_priv(dev);
 	struct batadv_attribute **bat_attr;
 	int err;
 
+	if (!batif_kobject)
+		return 0;
+
 	bat_priv->mesh_obj = kobject_create_and_add(BATADV_SYSFS_IF_MESH_SUBDIR,
 						    batif_kobject);
 	if (!bat_priv->mesh_obj) {
@@ -778,6 +781,9 @@ void batadv_sysfs_del_meshif(struct net_device *dev)
 	struct batadv_priv *bat_priv = netdev_priv(dev);
 	struct batadv_attribute **bat_attr;
 
+	if (!bat_priv->mesh_obj)
+		return;
+
 	for (bat_attr = batadv_mesh_attrs; *bat_attr; ++bat_attr)
 		sysfs_remove_file(bat_priv->mesh_obj, &((*bat_attr)->attr));
 
@@ -1132,10 +1138,13 @@ static struct batadv_attribute *batadv_batman_attrs[] = {
 
 int batadv_sysfs_add_hardif(struct kobject **hardif_obj, struct net_device *dev)
 {
-	struct kobject *hardif_kobject = &dev->dev.kobj;
+	struct kobject *hardif_kobject = netdev_kobject(dev);
 	struct batadv_attribute **bat_attr;
 	int err;
 
+	if (!hardif_kobject)
+		return 0;
+
 	*hardif_obj = kobject_create_and_add(BATADV_SYSFS_IF_BAT_SUBDIR,
 					     hardif_kobject);
 
diff --git a/net/bridge/br_if.c b/net/bridge/br_if.c
index 7f8d05cf9065..a5354436ada8 100644
--- a/net/bridge/br_if.c
+++ b/net/bridge/br_if.c
@@ -485,6 +485,7 @@ int br_add_if(struct net_bridge *br, struct net_device *dev)
 	struct net_bridge_port *p;
 	int err = 0;
 	unsigned br_hr, dev_hr;
+	struct kobject *kobj;
 	bool changed_addr;
 
 	/* Don't allow bridging non-ethernet like devices, or DSA-enabled
@@ -521,10 +522,13 @@ int br_add_if(struct net_bridge *br, struct net_device *dev)
 	if (err)
 		goto put_back;
 
-	err = kobject_init_and_add(&p->kobj, &brport_ktype, &(dev->dev.kobj),
-				   SYSFS_BRIDGE_PORT_ATTR);
-	if (err)
-		goto err1;
+	kobj = netdev_kobject(dev);
+	if (kobj) {
+		err = kobject_init_and_add(&p->kobj, &brport_ktype, kobj,
+					   SYSFS_BRIDGE_PORT_ATTR);
+		if (err)
+			goto err1;
+	}
 
 	err = br_sysfs_addif(p);
 	if (err)
diff --git a/net/bridge/br_sysfs_br.c b/net/bridge/br_sysfs_br.c
index 0b5dd607444c..f6439664ffea 100644
--- a/net/bridge/br_sysfs_br.c
+++ b/net/bridge/br_sysfs_br.c
@@ -917,10 +917,13 @@ static struct bin_attribute bridge_forward = {
  */
 int br_sysfs_addbr(struct net_device *dev)
 {
-	struct kobject *brobj = &dev->dev.kobj;
+	struct kobject *brobj = netdev_kobject(dev);
 	struct net_bridge *br = netdev_priv(dev);
 	int err;
 
+	if (!brobj)
+		return 0;
+
 	err = sysfs_create_group(brobj, &bridge_group);
 	if (err) {
 		pr_info("%s: can't create group %s/%s\n",
@@ -944,9 +947,9 @@ int br_sysfs_addbr(struct net_device *dev)
 	}
 	return 0;
  out3:
-	sysfs_remove_bin_file(&dev->dev.kobj, &bridge_forward);
+	sysfs_remove_bin_file(brobj, &bridge_forward);
  out2:
-	sysfs_remove_group(&dev->dev.kobj, &bridge_group);
+	sysfs_remove_group(brobj, &bridge_group);
  out1:
 	return err;
 
@@ -954,10 +957,12 @@ int br_sysfs_addbr(struct net_device *dev)
 
 void br_sysfs_delbr(struct net_device *dev)
 {
-	struct kobject *kobj = &dev->dev.kobj;
+	struct kobject *kobj = netdev_kobject(dev);
 	struct net_bridge *br = netdev_priv(dev);
 
 	kobject_put(br->ifobj);
-	sysfs_remove_bin_file(kobj, &bridge_forward);
-	sysfs_remove_group(kobj, &bridge_group);
+	if (kobj) {
+		sysfs_remove_bin_file(kobj, &bridge_forward);
+		sysfs_remove_group(kobj, &bridge_group);
+	}
 }
diff --git a/net/bridge/br_sysfs_if.c b/net/bridge/br_sysfs_if.c
index 5d5d413a6cf8..4256e78f6c9f 100644
--- a/net/bridge/br_sysfs_if.c
+++ b/net/bridge/br_sysfs_if.c
@@ -283,10 +283,14 @@ int br_sysfs_addif(struct net_bridge_port *p)
 {
 	struct net_bridge *br = p->br;
 	const struct brport_attribute **a;
+	struct kobject *br_kobj;
 	int err;
 
-	err = sysfs_create_link(&p->kobj, &br->dev->dev.kobj,
-				SYSFS_BRIDGE_PORT_LINK);
+	br_kobj = netdev_kobject(br->dev);
+	if (!br_kobj)
+		return 0;
+
+	err = sysfs_create_link(&p->kobj, br_kobj, SYSFS_BRIDGE_PORT_LINK);
 	if (err)
 		return err;
 
diff --git a/net/core/dev.c b/net/core/dev.c
index d07aa5ffb511..f166b3bf1895 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -5910,22 +5910,33 @@ static int netdev_adjacent_sysfs_add(struct net_device *dev,
 			      struct net_device *adj_dev,
 			      struct list_head *dev_list)
 {
+	struct kobject *dev_kobj, *adj_kobj;
 	char linkname[IFNAMSIZ+7];
+	int rc = 0;
 
-	sprintf(linkname, dev_list == &dev->adj_list.upper ?
-		"upper_%s" : "lower_%s", adj_dev->name);
-	return sysfs_create_link(&(dev->dev.kobj), &(adj_dev->dev.kobj),
-				 linkname);
+	dev_kobj = netdev_kobject(dev);
+	adj_kobj = netdev_kobject(adj_dev);
+
+	if (dev_kobj && adj_kobj) {
+		sprintf(linkname, dev_list == &dev->adj_list.upper ?
+			"upper_%s" : "lower_%s", adj_dev->name);
+		rc = sysfs_create_link(dev_kobj, adj_kobj, linkname);
+	}
+	return rc;
 }
+
 static void netdev_adjacent_sysfs_del(struct net_device *dev,
 			       char *name,
 			       struct list_head *dev_list)
 {
+	struct kobject *kobj = netdev_kobject(dev);
 	char linkname[IFNAMSIZ+7];
 
-	sprintf(linkname, dev_list == &dev->adj_list.upper ?
-		"upper_%s" : "lower_%s", name);
-	sysfs_remove_link(&(dev->dev.kobj), linkname);
+	if (kobj) {
+		sprintf(linkname, dev_list == &dev->adj_list.upper ?
+			"upper_%s" : "lower_%s", name);
+		sysfs_remove_link(kobj, linkname);
+	}
 }
 
 static inline bool netdev_adjacent_is_neigh_list(struct net_device *dev,
@@ -5976,11 +5987,14 @@ static int __netdev_adjacent_dev_insert(struct net_device *dev,
 
 	/* Ensure that master link is always the first item in list. */
 	if (master) {
-		ret = sysfs_create_link(&(dev->dev.kobj),
-					&(adj_dev->dev.kobj), "master");
-		if (ret)
-			goto remove_symlinks;
+		struct kobject *dev_kobj = netdev_kobject(dev);
+		struct kobject *adj_kobj = netdev_kobject(adj_dev);
 
+		if (dev_kobj && adj_kobj) {
+			ret = sysfs_create_link(dev_kobj, adj_kobj, "master");
+			if (ret)
+				goto remove_symlinks;
+		}
 		list_add_rcu(&adj->list, dev_list);
 	} else {
 		list_add_tail_rcu(&adj->list, dev_list);
@@ -6025,8 +6039,12 @@ static void __netdev_adjacent_dev_remove(struct net_device *dev,
 		return;
 	}
 
-	if (adj->master)
-		sysfs_remove_link(&(dev->dev.kobj), "master");
+	if (adj->master) {
+		struct kobject *kobj = netdev_kobject(dev);
+
+		if (kobj)
+			sysfs_remove_link(kobj, "master");
+	}
 
 	if (netdev_adjacent_is_neigh_list(dev, adj_dev, dev_list))
 		netdev_adjacent_sysfs_del(dev, adj_dev->name, dev_list);
@@ -7665,6 +7683,7 @@ void netdev_run_todo(void)
 		rcu_barrier();
 
 	while (!list_empty(&list)) {
+		struct kobject *kobj;
 		struct net_device *dev
 			= list_first_entry(&list, struct net_device, todo_list);
 		list_del(&dev->todo_list);
@@ -7702,7 +7721,9 @@ void netdev_run_todo(void)
 		wake_up(&netdev_unregistering_wq);
 
 		/* Free network device */
-		kobject_put(&dev->dev.kobj);
+		kobj = netdev_kobject(dev);
+		if (kobj)
+			kobject_put(kobj);
 	}
 }
 
@@ -8071,6 +8092,7 @@ EXPORT_SYMBOL(unregister_netdev);
 
 int dev_change_net_namespace(struct net_device *dev, struct net *net, const char *pat)
 {
+	struct kobject *kobj;
 	int err;
 
 	ASSERT_RTNL();
@@ -8136,7 +8158,9 @@ int dev_change_net_namespace(struct net_device *dev, struct net *net, const char
 	dev_mc_flush(dev);
 
 	/* Send a netdev-removed uevent to the old namespace */
-	kobject_uevent(&dev->dev.kobj, KOBJ_REMOVE);
+	kobj = netdev_kobject(dev);
+	if (kobj)
+		kobject_uevent(kobj, KOBJ_REMOVE);
 	netdev_adjacent_del_links(dev);
 
 	/* Actually switch the network namespace */
@@ -8147,7 +8171,8 @@ int dev_change_net_namespace(struct net_device *dev, struct net *net, const char
 		dev->ifindex = dev_new_index(net);
 
 	/* Send a netdev-add uevent to the new namespace */
-	kobject_uevent(&dev->dev.kobj, KOBJ_ADD);
+	if (kobj)
+		kobject_uevent(kobj, KOBJ_ADD);
 	netdev_adjacent_add_links(dev);
 
 	/* Fixup kobjects */
diff --git a/net/core/net-sysfs.c b/net/core/net-sysfs.c
index 65ea0ff4017c..9df53b688f5b 100644
--- a/net/core/net-sysfs.c
+++ b/net/core/net-sysfs.c
@@ -1390,10 +1390,13 @@ static int register_queue_kobjects(struct net_device *dev)
 	int error = 0, txq = 0, rxq = 0, real_rx = 0, real_tx = 0;
 
 #ifdef CONFIG_SYSFS
-	dev->queues_kset = kset_create_and_add("queues",
-	    NULL, &dev->dev.kobj);
-	if (!dev->queues_kset)
-		return -ENOMEM;
+	struct kobject *kobj = netdev_kobject(dev);
+
+	if (kobj) {
+		dev->queues_kset = kset_create_and_add("queues", NULL, kobj);
+		if (!dev->queues_kset)
+			return -ENOMEM;
+	}
 	real_rx = dev->real_num_rx_queues;
 #endif
 	real_tx = dev->real_num_tx_queues;
diff --git a/net/wireless/core.c b/net/wireless/core.c
index 83ea164f16b3..a73b3efc17b2 100644
--- a/net/wireless/core.c
+++ b/net/wireless/core.c
@@ -1122,6 +1122,7 @@ static int cfg80211_netdev_notifier_call(struct notifier_block *nb,
 	struct wireless_dev *wdev = dev->ieee80211_ptr;
 	struct cfg80211_registered_device *rdev;
 	struct cfg80211_sched_scan_request *pos, *tmp;
+	struct kobject *kobj;
 
 	if (!wdev)
 		return NOTIFY_DONE;
@@ -1160,9 +1161,12 @@ static int cfg80211_netdev_notifier_call(struct notifier_block *nb,
 		/* can only change netns with wiphy */
 		dev->features |= NETIF_F_NETNS_LOCAL;
 
-		if (sysfs_create_link(&dev->dev.kobj, &rdev->wiphy.dev.kobj,
-				      "phy80211")) {
-			pr_err("failed to add phy80211 symlink to netdev!\n");
+		kobj = netdev_kobject(dev);
+		if (kobj) {
+			if (sysfs_create_link(kobj, &rdev->wiphy.dev.kobj,
+					      "phy80211")) {
+				pr_err("failed to add phy80211 symlink to netdev!\n");
+			}
 		}
 		wdev->netdev = dev;
 #ifdef CONFIG_CFG80211_WEXT
@@ -1264,9 +1268,12 @@ static int cfg80211_netdev_notifier_call(struct notifier_block *nb,
 		 * remove and clean it up.
 		 */
 		if (!list_empty(&wdev->list)) {
+			struct kobject *kobj = netdev_kobject(dev);
+
 			nl80211_notify_iface(rdev, wdev,
 					     NL80211_CMD_DEL_INTERFACE);
-			sysfs_remove_link(&dev->dev.kobj, "phy80211");
+			if (kobj)
+				sysfs_remove_link(kobj, "phy80211");
 			list_del_rcu(&wdev->list);
 			rdev->devlist_generation++;
 			cfg80211_mlme_purge_registrations(wdev);
-- 
2.11.0 (Apple Git-81)

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH RFC net-next 2/6] net: Add flags argument to alloc_netdev_mqs
  2017-05-06 16:07 [PATCH RFC net-next 0/6] net: reducing memory footprint of network devices David Ahern
  2017-05-06 16:07 ` [PATCH RFC net-next 1/6] net: Add accessor for kboject in a net_device David Ahern
@ 2017-05-06 16:07 ` David Ahern
  2017-05-06 16:07 ` [PATCH RFC net-next 3/6] net: Introduce IFF_LWT_NETDEV flag David Ahern
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 17+ messages in thread
From: David Ahern @ 2017-05-06 16:07 UTC (permalink / raw)
  To: netdev; +Cc: roopa, f.fainelli, nicolas.dichtel, David Ahern

Used in a later patch to pass in flags at create time

Signed-off-by: David Ahern <dsahern@gmail.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/ipoib.c | 2 +-
 drivers/net/ethernet/tile/tilegx.c              | 2 +-
 drivers/net/tun.c                               | 2 +-
 drivers/net/wireless/marvell/mwifiex/cfg80211.c | 2 +-
 include/linux/netdevice.h                       | 7 ++++---
 net/core/dev.c                                  | 5 ++++-
 net/core/rtnetlink.c                            | 2 +-
 net/ethernet/eth.c                              | 2 +-
 net/mac80211/iface.c                            | 2 +-
 9 files changed, 15 insertions(+), 11 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/ipoib.c b/drivers/net/ethernet/mellanox/mlx5/core/ipoib.c
index 3c84e36af018..f5aaa92726a2 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/ipoib.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/ipoib.c
@@ -446,7 +446,7 @@ static struct net_device *mlx5_rdma_netdev_alloc(struct mlx5_core_dev *mdev,
 				  name, NET_NAME_UNKNOWN,
 				  setup,
 				  nch * MLX5E_MAX_NUM_TC,
-				  nch);
+				  nch, 0);
 	if (!netdev) {
 		mlx5_core_warn(mdev, "alloc_netdev_mqs failed\n");
 		goto free_mdev_resources;
diff --git a/drivers/net/ethernet/tile/tilegx.c b/drivers/net/ethernet/tile/tilegx.c
index 7c634bc75615..f38067e260bd 100644
--- a/drivers/net/ethernet/tile/tilegx.c
+++ b/drivers/net/ethernet/tile/tilegx.c
@@ -2198,7 +2198,7 @@ static void tile_net_dev_init(const char *name, const uint8_t *mac)
 	 * template, instantiated by register_netdev(), but not for us.
 	 */
 	dev = alloc_netdev_mqs(sizeof(*priv), name, NET_NAME_UNKNOWN,
-			       tile_net_setup, NR_CPUS, 1);
+			       tile_net_setup, NR_CPUS, 1, 0);
 	if (!dev) {
 		pr_err("alloc_netdev_mqs(%s) failed\n", name);
 		return;
diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index bbd707b9ef7a..030621621ea8 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -1804,7 +1804,7 @@ static int tun_set_iff(struct net *net, struct file *file, struct ifreq *ifr)
 
 		dev = alloc_netdev_mqs(sizeof(struct tun_struct), name,
 				       NET_NAME_UNKNOWN, tun_setup, queues,
-				       queues);
+				       queues, 0);
 
 		if (!dev)
 			return -ENOMEM;
diff --git a/drivers/net/wireless/marvell/mwifiex/cfg80211.c b/drivers/net/wireless/marvell/mwifiex/cfg80211.c
index 7ec06bf13413..38b6570ff1cd 100644
--- a/drivers/net/wireless/marvell/mwifiex/cfg80211.c
+++ b/drivers/net/wireless/marvell/mwifiex/cfg80211.c
@@ -2960,7 +2960,7 @@ struct wireless_dev *mwifiex_add_virtual_intf(struct wiphy *wiphy,
 
 	dev = alloc_netdev_mqs(sizeof(struct mwifiex_private *), name,
 			       name_assign_type, ether_setup,
-			       IEEE80211_NUM_ACS, 1);
+			       IEEE80211_NUM_ACS, 1, 0);
 	if (!dev) {
 		mwifiex_dbg(adapter, ERROR,
 			    "no memory available for netdevice\n");
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 305d2d42b349..f47c8712398a 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -3699,13 +3699,14 @@ void ether_setup(struct net_device *dev);
 struct net_device *alloc_netdev_mqs(int sizeof_priv, const char *name,
 				    unsigned char name_assign_type,
 				    void (*setup)(struct net_device *),
-				    unsigned int txqs, unsigned int rxqs);
+				    unsigned int txqs, unsigned int rxqs,
+				    unsigned int flags);
 #define alloc_netdev(sizeof_priv, name, name_assign_type, setup) \
-	alloc_netdev_mqs(sizeof_priv, name, name_assign_type, setup, 1, 1)
+	alloc_netdev_mqs(sizeof_priv, name, name_assign_type, setup, 1, 1, 0)
 
 #define alloc_netdev_mq(sizeof_priv, name, name_assign_type, setup, count) \
 	alloc_netdev_mqs(sizeof_priv, name, name_assign_type, setup, count, \
-			 count)
+			 count, 0)
 
 int register_netdev(struct net_device *dev);
 void unregister_netdev(struct net_device *dev);
diff --git a/net/core/dev.c b/net/core/dev.c
index f166b3bf1895..48a0252037d5 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -7829,6 +7829,7 @@ void netdev_freemem(struct net_device *dev)
  * @setup: callback to initialize device
  * @txqs: the number of TX subqueues to allocate
  * @rxqs: the number of RX subqueues to allocate
+ * @flags: flags to 'or' with priv_flags
  *
  * Allocates a struct net_device with private data area for driver use
  * and performs basic initialization.  Also allocates subqueue structs
@@ -7837,7 +7838,8 @@ void netdev_freemem(struct net_device *dev)
 struct net_device *alloc_netdev_mqs(int sizeof_priv, const char *name,
 		unsigned char name_assign_type,
 		void (*setup)(struct net_device *),
-		unsigned int txqs, unsigned int rxqs)
+		unsigned int txqs, unsigned int rxqs,
+		unsigned int flags)
 {
 	struct net_device *dev;
 	size_t alloc_size;
@@ -7920,6 +7922,7 @@ struct net_device *alloc_netdev_mqs(int sizeof_priv, const char *name,
 	if (netif_alloc_rx_queues(dev))
 		goto free_all;
 #endif
+	dev->priv_flags |= flags;
 
 	strcpy(dev->name, name);
 	dev->name_assign_type = name_assign_type;
diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index bcb0f610ee42..a4db1cd91c4a 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -2390,7 +2390,7 @@ struct net_device *rtnl_create_link(struct net *net,
 		num_rx_queues = ops->get_num_rx_queues();
 
 	dev = alloc_netdev_mqs(ops->priv_size, ifname, name_assign_type,
-			       ops->setup, num_tx_queues, num_rx_queues);
+			       ops->setup, num_tx_queues, num_rx_queues, 0);
 	if (!dev)
 		return ERR_PTR(-ENOMEM);
 
diff --git a/net/ethernet/eth.c b/net/ethernet/eth.c
index 1446810047f5..d8f489e134f0 100644
--- a/net/ethernet/eth.c
+++ b/net/ethernet/eth.c
@@ -389,7 +389,7 @@ struct net_device *alloc_etherdev_mqs(int sizeof_priv, unsigned int txqs,
 				      unsigned int rxqs)
 {
 	return alloc_netdev_mqs(sizeof_priv, "eth%d", NET_NAME_UNKNOWN,
-				ether_setup, txqs, rxqs);
+				ether_setup, txqs, rxqs, 0);
 }
 EXPORT_SYMBOL(alloc_etherdev_mqs);
 
diff --git a/net/mac80211/iface.c b/net/mac80211/iface.c
index 3bd5b81f5d81..54891601e3d1 100644
--- a/net/mac80211/iface.c
+++ b/net/mac80211/iface.c
@@ -1802,7 +1802,7 @@ int ieee80211_if_add(struct ieee80211_local *local, const char *name,
 
 		ndev = alloc_netdev_mqs(size + txq_size,
 					name, name_assign_type,
-					if_setup, txqs, 1);
+					if_setup, txqs, 1, 0);
 		if (!ndev)
 			return -ENOMEM;
 		dev_net_set(ndev, wiphy_net(local->hw.wiphy));
-- 
2.11.0 (Apple Git-81)

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH RFC net-next 3/6] net: Introduce IFF_LWT_NETDEV flag
  2017-05-06 16:07 [PATCH RFC net-next 0/6] net: reducing memory footprint of network devices David Ahern
  2017-05-06 16:07 ` [PATCH RFC net-next 1/6] net: Add accessor for kboject in a net_device David Ahern
  2017-05-06 16:07 ` [PATCH RFC net-next 2/6] net: Add flags argument to alloc_netdev_mqs David Ahern
@ 2017-05-06 16:07 ` David Ahern
  2017-05-08  8:55   ` Johannes Berg
  2017-05-06 16:07 ` [PATCH RFC net-next 4/6] net: Do not intialize kobject for lightweight netdevs David Ahern
                   ` (3 subsequent siblings)
  6 siblings, 1 reply; 17+ messages in thread
From: David Ahern @ 2017-05-06 16:07 UTC (permalink / raw)
  To: netdev; +Cc: roopa, f.fainelli, nicolas.dichtel, David Ahern

Add new flag to denote lightweight netdevices. Add helper to identify
such devices.

Signed-off-by: David Ahern <dsahern@gmail.com>
---
 include/linux/netdevice.h | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index f47c8712398a..08151fd34973 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -1401,6 +1401,7 @@ enum netdev_priv_flags {
 	IFF_RXFH_CONFIGURED		= 1<<25,
 	IFF_PHONY_HEADROOM		= 1<<26,
 	IFF_MACSEC			= 1<<27,
+	IFF_LWT_NETDEV			= 1<<28,
 };
 
 #define IFF_802_1Q_VLAN			IFF_802_1Q_VLAN
@@ -1430,6 +1431,7 @@ enum netdev_priv_flags {
 #define IFF_TEAM			IFF_TEAM
 #define IFF_RXFH_CONFIGURED		IFF_RXFH_CONFIGURED
 #define IFF_MACSEC			IFF_MACSEC
+#define IFF_LWT_NETDEV			IFF_LWT_NETDEV
 
 /**
  *	struct net_device - The DEVICE structure.
@@ -4137,6 +4139,11 @@ static inline void skb_gso_error_unwind(struct sk_buff *skb, __be16 protocol,
 	skb->mac_len = mac_len;
 }
 
+static inline bool netif_is_lwd(struct net_device *dev)
+{
+	return !!(dev->priv_flags & IFF_LWT_NETDEV);
+}
+
 static inline bool netif_is_macsec(const struct net_device *dev)
 {
 	return dev->priv_flags & IFF_MACSEC;
-- 
2.11.0 (Apple Git-81)

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH RFC net-next 4/6] net: Do not intialize kobject for lightweight netdevs
  2017-05-06 16:07 [PATCH RFC net-next 0/6] net: reducing memory footprint of network devices David Ahern
                   ` (2 preceding siblings ...)
  2017-05-06 16:07 ` [PATCH RFC net-next 3/6] net: Introduce IFF_LWT_NETDEV flag David Ahern
@ 2017-05-06 16:07 ` David Ahern
  2017-05-08 17:26   ` Florian Fainelli
  2017-05-06 16:07 ` [PATCH RFC net-next 5/6] net: Delay initializations for lightweight devices David Ahern
                   ` (2 subsequent siblings)
  6 siblings, 1 reply; 17+ messages in thread
From: David Ahern @ 2017-05-06 16:07 UTC (permalink / raw)
  To: netdev; +Cc: roopa, f.fainelli, nicolas.dichtel, David Ahern

Lightweight netdevices are not added to sysfs; bypass kobject
initialization.

Signed-off-by: David Ahern <dsahern@gmail.com>
---
 include/linux/netdevice.h |  3 +++
 net/core/dev.c            |  9 ++++++---
 net/core/net-sysfs.c      | 14 +++++++++++---
 3 files changed, 20 insertions(+), 6 deletions(-)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 08151fd34973..4ddd0ac7e1cb 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -4282,6 +4282,9 @@ static inline const char *netdev_reg_state(const struct net_device *dev)
 
 static inline struct kobject *netdev_kobject(struct net_device *dev)
 {
+	if (netif_is_lwd(dev))
+		return NULL;
+
 	return &dev->dev.kobj;
 }
 
diff --git a/net/core/dev.c b/net/core/dev.c
index 48a0252037d5..52bb01041d12 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -7993,7 +7993,8 @@ void free_netdev(struct net_device *dev)
 	dev->reg_state = NETREG_RELEASED;
 
 	/* will free via device release */
-	put_device(&dev->dev);
+	if (!netif_is_lwd(dev))
+		put_device(&dev->dev);
 }
 EXPORT_SYMBOL(free_netdev);
 
@@ -8179,8 +8180,10 @@ int dev_change_net_namespace(struct net_device *dev, struct net *net, const char
 	netdev_adjacent_add_links(dev);
 
 	/* Fixup kobjects */
-	err = device_rename(&dev->dev, dev->name);
-	WARN_ON(err);
+	if (!netif_is_lwd(dev)) {
+		err = device_rename(&dev->dev, dev->name);
+		WARN_ON(err);
+	}
 
 	/* Add the device back in the hashes */
 	list_netdevice(dev);
diff --git a/net/core/net-sysfs.c b/net/core/net-sysfs.c
index 9df53b688f5b..725348cdeb3b 100644
--- a/net/core/net-sysfs.c
+++ b/net/core/net-sysfs.c
@@ -1559,18 +1559,22 @@ EXPORT_SYMBOL(of_find_net_device_by_node);
  */
 void netdev_unregister_kobject(struct net_device *ndev)
 {
+	struct kobject *kobj = netdev_kobject(ndev);
 	struct device *dev = &(ndev->dev);
 
 	if (!atomic_read(&dev_net(ndev)->count))
 		dev_set_uevent_suppress(dev, 1);
 
-	kobject_get(&dev->kobj);
+	if (kobj)
+		kobject_get(kobj);
 
-	remove_queue_kobjects(ndev);
+	if (!netif_is_lwd(ndev))
+		remove_queue_kobjects(ndev);
 
 	pm_runtime_set_memalloc_noio(dev, false);
 
-	device_del(dev);
+	if (!netif_is_lwd(ndev))
+		device_del(dev);
 }
 
 /* Create sysfs entries for network device. */
@@ -1580,6 +1584,9 @@ int netdev_register_kobject(struct net_device *ndev)
 	const struct attribute_group **groups = ndev->sysfs_groups;
 	int error = 0;
 
+	if (netif_is_lwd(ndev))
+		goto pm;
+
 	device_initialize(dev);
 	dev->class = &net_class;
 	dev->platform_data = ndev;
@@ -1614,6 +1621,7 @@ int netdev_register_kobject(struct net_device *ndev)
 		return error;
 	}
 
+pm:
 	pm_runtime_set_memalloc_noio(dev, true);
 
 	return error;
-- 
2.11.0 (Apple Git-81)

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH RFC net-next 5/6] net: Delay initializations for lightweight devices
  2017-05-06 16:07 [PATCH RFC net-next 0/6] net: reducing memory footprint of network devices David Ahern
                   ` (3 preceding siblings ...)
  2017-05-06 16:07 ` [PATCH RFC net-next 4/6] net: Do not intialize kobject for lightweight netdevs David Ahern
@ 2017-05-06 16:07 ` David Ahern
  2017-05-08 17:31   ` Florian Fainelli
  2017-05-06 16:07 ` [PATCH RFC net-next 6/6] net: add uapi for creating " David Ahern
  2017-05-08 17:35 ` [PATCH RFC net-next 0/6] net: reducing memory footprint of network devices Florian Fainelli
  6 siblings, 1 reply; 17+ messages in thread
From: David Ahern @ 2017-05-06 16:07 UTC (permalink / raw)
  To: netdev; +Cc: roopa, f.fainelli, nicolas.dichtel, David Ahern

Delay ipv4 and ipv6 initializations on lightweight netdevices until an
address is added to the device.

Skip sysctl initialization for neighbor path as well.

Signed-off-by: David Ahern <dsahern@gmail.com>
---
 include/linux/netdevice.h |  5 +++++
 net/core/neighbour.c      |  3 +++
 net/ipv4/devinet.c        | 18 ++++++++++++++++--
 net/ipv6/addrconf.c       |  9 +++++++++
 net/mpls/af_mpls.c        |  6 ++++++
 5 files changed, 39 insertions(+), 2 deletions(-)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 4ddd0ac7e1cb..32d155be777a 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -4144,6 +4144,11 @@ static inline bool netif_is_lwd(struct net_device *dev)
 	return !!(dev->priv_flags & IFF_LWT_NETDEV);
 }
 
+static inline bool netif_has_sysctl(struct net_device *dev)
+{
+	return !netif_is_lwd(dev);
+}
+
 static inline bool netif_is_macsec(const struct net_device *dev)
 {
 	return dev->priv_flags & IFF_MACSEC;
diff --git a/net/core/neighbour.c b/net/core/neighbour.c
index 58b0bcc125b5..10104a7135e2 100644
--- a/net/core/neighbour.c
+++ b/net/core/neighbour.c
@@ -3123,6 +3123,9 @@ int neigh_sysctl_register(struct net_device *dev, struct neigh_parms *p,
 	char neigh_path[ sizeof("net//neigh/") + IFNAMSIZ + IFNAMSIZ ];
 	char *p_name;
 
+	if (dev && !netif_has_sysctl(dev))
+		return 0;
+
 	t = kmemdup(&neigh_sysctl_template, sizeof(*t), GFP_KERNEL);
 	if (!t)
 		goto err;
diff --git a/net/ipv4/devinet.c b/net/ipv4/devinet.c
index df14815a3b8c..c5ffd3ed4b2c 100644
--- a/net/ipv4/devinet.c
+++ b/net/ipv4/devinet.c
@@ -771,8 +771,15 @@ static struct in_ifaddr *rtm_to_ifaddr(struct net *net, struct nlmsghdr *nlh,
 
 	in_dev = __in_dev_get_rtnl(dev);
 	err = -ENOBUFS;
-	if (!in_dev)
-		goto errout;
+	if (!in_dev) {
+		if (netif_is_lwd(dev)) {
+			in_dev = inetdev_init(dev);
+			if (IS_ERR(in_dev))
+				in_dev = NULL;
+		}
+		if (!in_dev)
+			goto errout;
+	}
 
 	ifa = inet_alloc_ifa();
 	if (!ifa)
@@ -1417,6 +1424,10 @@ static int inetdev_event(struct notifier_block *this, unsigned long event,
 
 	if (!in_dev) {
 		if (event == NETDEV_REGISTER) {
+			/* inet init is deferred for lightweight devices */
+			if (netif_is_lwd(dev))
+				goto out;
+
 			in_dev = inetdev_init(dev);
 			if (IS_ERR(in_dev))
 				return notifier_from_errno(PTR_ERR(in_dev));
@@ -2303,6 +2314,9 @@ static int devinet_sysctl_register(struct in_device *idev)
 {
 	int err;
 
+	if (!netif_has_sysctl(idev->dev))
+		return 0;
+
 	if (!sysctl_dev_name_is_allowed(idev->dev->name))
 		return -EINVAL;
 
diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
index 8d297a79b568..9814df6b7017 100644
--- a/net/ipv6/addrconf.c
+++ b/net/ipv6/addrconf.c
@@ -3371,6 +3371,10 @@ static int addrconf_notify(struct notifier_block *this, unsigned long event,
 
 	switch (event) {
 	case NETDEV_REGISTER:
+		/* inet6 init is deferred for lightweight devices */
+		if (netif_is_lwd(dev))
+			return NOTIFY_OK;
+
 		if (!idev && dev->mtu >= IPV6_MIN_MTU) {
 			idev = ipv6_add_dev(dev);
 			if (IS_ERR(idev))
@@ -6368,6 +6372,11 @@ static int __addrconf_sysctl_register(struct net *net, char *dev_name,
 	struct ctl_table *table;
 	char path[sizeof("net/ipv6/conf/") + IFNAMSIZ];
 
+	if (idev && idev->dev && !netif_has_sysctl(idev->dev)) {
+		p->sysctl_header = NULL;
+		return 0;
+	}
+
 	table = kmemdup(addrconf_sysctl, sizeof(addrconf_sysctl), GFP_KERNEL);
 	if (!table)
 		goto out;
diff --git a/net/mpls/af_mpls.c b/net/mpls/af_mpls.c
index 088e2b459d0f..7503d68da2ea 100644
--- a/net/mpls/af_mpls.c
+++ b/net/mpls/af_mpls.c
@@ -1251,6 +1251,9 @@ static int mpls_dev_sysctl_register(struct net_device *dev,
 	struct ctl_table *table;
 	int i;
 
+	if (!netif_has_sysctl(dev))
+		return 0;
+
 	table = kmemdup(&mpls_dev_table, sizeof(mpls_dev_table), GFP_KERNEL);
 	if (!table)
 		goto out;
@@ -1285,6 +1288,9 @@ static void mpls_dev_sysctl_unregister(struct net_device *dev,
 	struct net *net = dev_net(dev);
 	struct ctl_table *table;
 
+	if (!mdev->sysctl)
+		return;
+
 	table = mdev->sysctl->ctl_table_arg;
 	unregister_net_sysctl_table(mdev->sysctl);
 	kfree(table);
-- 
2.11.0 (Apple Git-81)

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH RFC net-next 6/6] net: add uapi for creating lightweight devices
  2017-05-06 16:07 [PATCH RFC net-next 0/6] net: reducing memory footprint of network devices David Ahern
                   ` (4 preceding siblings ...)
  2017-05-06 16:07 ` [PATCH RFC net-next 5/6] net: Delay initializations for lightweight devices David Ahern
@ 2017-05-06 16:07 ` David Ahern
  2017-05-08 17:35 ` [PATCH RFC net-next 0/6] net: reducing memory footprint of network devices Florian Fainelli
  6 siblings, 0 replies; 17+ messages in thread
From: David Ahern @ 2017-05-06 16:07 UTC (permalink / raw)
  To: netdev; +Cc: roopa, f.fainelli, nicolas.dichtel, David Ahern

Allow users to make new devices lightweight by setting IFLA_LWT_NETDEV
attribute in the newlink request.

Signed-off-by: David Ahern <dsahern@gmail.com>
---
 include/uapi/linux/if_link.h |  1 +
 net/core/rtnetlink.c         | 10 +++++++++-
 2 files changed, 10 insertions(+), 1 deletion(-)

diff --git a/include/uapi/linux/if_link.h b/include/uapi/linux/if_link.h
index 8e56ac70e0d1..f57a16e542b7 100644
--- a/include/uapi/linux/if_link.h
+++ b/include/uapi/linux/if_link.h
@@ -157,6 +157,7 @@ enum {
 	IFLA_GSO_MAX_SIZE,
 	IFLA_PAD,
 	IFLA_XDP,
+	IFLA_LWT_NETDEV,
 	__IFLA_MAX
 };
 
diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index a4db1cd91c4a..9c18e6dec379 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -2378,6 +2378,7 @@ struct net_device *rtnl_create_link(struct net *net,
 	struct net_device *dev;
 	unsigned int num_tx_queues = 1;
 	unsigned int num_rx_queues = 1;
+	unsigned int flags = 0;
 
 	if (tb[IFLA_NUM_TX_QUEUES])
 		num_tx_queues = nla_get_u32(tb[IFLA_NUM_TX_QUEUES]);
@@ -2389,8 +2390,15 @@ struct net_device *rtnl_create_link(struct net *net,
 	else if (ops->get_num_rx_queues)
 		num_rx_queues = ops->get_num_rx_queues();
 
+	if (tb[IFLA_LWT_NETDEV]) {
+		u8 lwt_dev = !!nla_get_u8(tb[IFLA_LWT_NETDEV]);
+
+		if (lwt_dev)
+			flags |= IFF_LWT_NETDEV;
+	}
+
 	dev = alloc_netdev_mqs(ops->priv_size, ifname, name_assign_type,
-			       ops->setup, num_tx_queues, num_rx_queues, 0);
+			       ops->setup, num_tx_queues, num_rx_queues, flags);
 	if (!dev)
 		return ERR_PTR(-ENOMEM);
 
-- 
2.11.0 (Apple Git-81)

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH RFC net-next 3/6] net: Introduce IFF_LWT_NETDEV flag
  2017-05-06 16:07 ` [PATCH RFC net-next 3/6] net: Introduce IFF_LWT_NETDEV flag David Ahern
@ 2017-05-08  8:55   ` Johannes Berg
  2017-05-08 20:11     ` David Miller
  0 siblings, 1 reply; 17+ messages in thread
From: Johannes Berg @ 2017-05-08  8:55 UTC (permalink / raw)
  To: David Ahern, netdev; +Cc: roopa, f.fainelli, nicolas.dichtel


> +static inline bool netif_is_lwd(struct net_device *dev)
> +{
> +	return !!(dev->priv_flags & IFF_LWT_NETDEV);
> +}

Am I the only one who thinks that this "LWT_NETDEV" vs "LWD" is a bit
confusing?

Is "netif_is_lwt_netdev()" really too long?

johannes

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH RFC net-next 4/6] net: Do not intialize kobject for lightweight netdevs
  2017-05-06 16:07 ` [PATCH RFC net-next 4/6] net: Do not intialize kobject for lightweight netdevs David Ahern
@ 2017-05-08 17:26   ` Florian Fainelli
  0 siblings, 0 replies; 17+ messages in thread
From: Florian Fainelli @ 2017-05-08 17:26 UTC (permalink / raw)
  To: David Ahern, netdev; +Cc: roopa, nicolas.dichtel

On 05/06/2017 09:07 AM, David Ahern wrote:
> Lightweight netdevices are not added to sysfs; bypass kobject
> initialization.

I was wondering if we actually needed a flag to tell: this is a
lightweight device, but still let it show up in /sys. All use cases that
I have in mind (getting the physical port name etc. etc) can be done via
netlink which is not restricted even with LWT devices, so this sounds
reasonable. In case we need to revisit that, we can always add more
flags to control the lightweight devices creation and how this
percolates through the networking stack.

Thanks!
-- 
Florian

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH RFC net-next 5/6] net: Delay initializations for lightweight devices
  2017-05-06 16:07 ` [PATCH RFC net-next 5/6] net: Delay initializations for lightweight devices David Ahern
@ 2017-05-08 17:31   ` Florian Fainelli
  0 siblings, 0 replies; 17+ messages in thread
From: Florian Fainelli @ 2017-05-08 17:31 UTC (permalink / raw)
  To: David Ahern, netdev; +Cc: roopa, nicolas.dichtel

On 05/06/2017 09:07 AM, David Ahern wrote:
> Delay ipv4 and ipv6 initializations on lightweight netdevices until an
> address is added to the device.
> 
> Skip sysctl initialization for neighbor path as well.

Yeah, thanks for including the sysctl initialization. One thing that my
earlier "L2 only" attempt attempted to solve as well, was to put the
IFF_NOIPV4 and IFF_NOIPV6 flags as volatile. In case you changed your
mind and ended-up needing IP stacks to be initialized, this ought to be
possible at some point. I did not get to test that part though.

AFAIR, some peculiar devices like 6lowpan (and to some extent the larger
802.15.4 family) may want to be IPv6 exclusively. This means we may have
a bit of overlap with flags like IFF_NOARP, (the proposed IFF_NOIPV6
before) and IFF_LWT_NETDEV.

Thanks!
-- 
Florian

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH RFC net-next 0/6] net: reducing memory footprint of network devices
  2017-05-06 16:07 [PATCH RFC net-next 0/6] net: reducing memory footprint of network devices David Ahern
                   ` (5 preceding siblings ...)
  2017-05-06 16:07 ` [PATCH RFC net-next 6/6] net: add uapi for creating " David Ahern
@ 2017-05-08 17:35 ` Florian Fainelli
  2017-05-09  9:50   ` Nicolas Dichtel
  6 siblings, 1 reply; 17+ messages in thread
From: Florian Fainelli @ 2017-05-08 17:35 UTC (permalink / raw)
  To: David Ahern, netdev; +Cc: roopa, nicolas.dichtel

On 05/06/2017 09:07 AM, David Ahern wrote:
> As I have mentioned many times[1], at ~43+kB per instance the use of
> net_devices does not scale for deployments needing 10,000+ devices. At
> netconf 1.2 there was a discussion about using a net_device_common for
> the minimal set of common attributes with other structs built on top of
> that one for "full" devices. It provided a means for the code to know
> "non-standard" net_devices. Conceptually, that approach has its merits
> but it is not practical given the sweeping changes required to the code
> base. More importantly though struct net_device is not the problem; it
> weighs in at less than 2kB so reorganizing the code base around a
> refactored net_device is not going to solve the problem. The primary
> issue is all of the initializations done *because* it is a struct
> net_device -- kobject and sysfs and the protocols (e.g., ipv4, ipv6,
> mpls, neighbors).
> 
> So, how do you keep the desired attributes of a net device -- network
> addresses, xmit function, qdisc, netfilter rules, tcpdump -- while
> lowering the overhead of a net_device instance and without sweeping
> changes across net/ and drivers/net/?
> 
> This patch set introduces the concept of labeling net_devices as
> "lightweight", first mentioned at netdev 1.1 [1]. Users have to opt
> in to lightweight devices by passing a new attribute, IFLA_LWT_NETDEV,
> in the new link request. This lightweight tag is meant for virtual
> devices such as vlan, vrf, vti, and dummy where the user expects to
> create a lot of them and does not want the duplication of resources.
> Each device type can always opt out of a lightweight label if necessary
> by failing device creates.
> 
> Labeling a virtual device as "lightweight" reduces the footprint for
> device creation from ~43kB to ~6kB. That reduction in memory is obtained
> by:
> 1. no entry in sysfs
>    - kobject in net_device.device is not initialized
> 
> 2. no entry in procfs
>    - no sysctl option for these devices
> 
> 3. deferred ipv4, ipv6, mpls initialization
>    - network layer must be enabled before an address can be assigned
>      or mpls labels can be processed
>    - enables what Florian called L2 only devices [2]
> 
> Once the core premise of a lightweight device is accepted, follow on
> patches can reduce the overhead of network initializations. e.g.,
> 
> 1. remove devconf per device (ipv4 and ipv6)
>    - lightweight devices use the default settings rather than replicate
>      the same data for each device
> 
> 2. reduce / remove / opt out of snmp mibs
>    - snmp6_alloc_dev and icmpv6msg_mib_device specifically is a heavy
>      hitter
> 
> Patches can also be found here:
>     https://github.com/dsahern/linux lwt-dev-rfc
> 
> And iproute2 here:
>     https://github.com/dsahern/iproute2 lwt-dev
> 
> Example:
>     ip li add foo lwd type vrf table 123
> 
> - creates VRF device 'foo' as a lightweight netdevice.

This is really looking nice, thanks for posting this patch series! The
only submission wide comment I have is that the flag is named
IFF_LWT_NETDEV whereas the helper that checks for it is named
netif_is_lwd() so we should reconcile the two. Since there is an
existing lightweight tunnel infrastructure already, maybe using
IFF_LWD_NETDEV (or just IFF_LWD) would be good enough here?

> 
> 
> [1] http://www.netdevconf.org/1.1/proceedings/slides/ahern-aleksandrov-prabhu-scaling-network-cumulus.pdf
> [2] https://www.spinics.net/lists/netdev/msg340808.html
> David Ahern (6):
>   net: Add accessor for kboject in a net_device
>   net: Add flags argument to alloc_netdev_mqs
>   net: Introduce IFF_LWT_NETDEV flag
>   net: Do not intialize kobject for lightweight netdevs
>   net: Delay initializations for lightweight devices
>   net: add uapi for creating lightweight devices
> 
>  drivers/net/ethernet/mellanox/mlx5/core/ipoib.c |  2 +-
>  drivers/net/ethernet/tile/tilegx.c              |  2 +-
>  drivers/net/tun.c                               |  2 +-
>  drivers/net/wireless/marvell/mwifiex/cfg80211.c |  2 +-
>  include/linux/netdevice.h                       | 27 ++++++++--
>  include/uapi/linux/if_link.h                    |  1 +
>  net/batman-adv/sysfs.c                          | 13 ++++-
>  net/bridge/br_if.c                              | 12 +++--
>  net/bridge/br_sysfs_br.c                        | 17 +++---
>  net/bridge/br_sysfs_if.c                        |  8 ++-
>  net/core/dev.c                                  | 71 ++++++++++++++++++-------
>  net/core/neighbour.c                            |  3 ++
>  net/core/net-sysfs.c                            | 25 ++++++---
>  net/core/rtnetlink.c                            | 10 +++-
>  net/ethernet/eth.c                              |  2 +-
>  net/ipv4/devinet.c                              | 18 ++++++-
>  net/ipv6/addrconf.c                             |  9 ++++
>  net/mac80211/iface.c                            |  2 +-
>  net/mpls/af_mpls.c                              |  6 +++
>  net/wireless/core.c                             | 15 ++++--
>  20 files changed, 190 insertions(+), 57 deletions(-)
> 


-- 
Florian

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH RFC net-next 3/6] net: Introduce IFF_LWT_NETDEV flag
  2017-05-08  8:55   ` Johannes Berg
@ 2017-05-08 20:11     ` David Miller
  2017-05-08 21:37       ` Roopa Prabhu
  2017-05-09  0:57       ` David Ahern
  0 siblings, 2 replies; 17+ messages in thread
From: David Miller @ 2017-05-08 20:11 UTC (permalink / raw)
  To: johannes; +Cc: dsahern, netdev, roopa, f.fainelli, nicolas.dichtel

From: Johannes Berg <johannes@sipsolutions.net>
Date: Mon, 08 May 2017 10:55:12 +0200

> 
>> +static inline bool netif_is_lwd(struct net_device *dev)
>> +{
>> +	return !!(dev->priv_flags & IFF_LWT_NETDEV);
>> +}
> 
> Am I the only one who thinks that this "LWT_NETDEV" vs "LWD" is a bit
> confusing?

Agreed, my old eyes can't discern them at a distance :-)

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH RFC net-next 3/6] net: Introduce IFF_LWT_NETDEV flag
  2017-05-08 20:11     ` David Miller
@ 2017-05-08 21:37       ` Roopa Prabhu
  2017-05-09  0:57       ` David Ahern
  1 sibling, 0 replies; 17+ messages in thread
From: Roopa Prabhu @ 2017-05-08 21:37 UTC (permalink / raw)
  To: David Miller
  Cc: Johannes Berg, David Ahern, netdev, Florian Fainelli, Nicolas Dichtel

On Mon, May 8, 2017 at 1:11 PM, David Miller <davem@davemloft.net> wrote:
> From: Johannes Berg <johannes@sipsolutions.net>
> Date: Mon, 08 May 2017 10:55:12 +0200
>
>>
>>> +static inline bool netif_is_lwd(struct net_device *dev)
>>> +{
>>> +    return !!(dev->priv_flags & IFF_LWT_NETDEV);
>>> +}
>>
>> Am I the only one who thinks that this "LWT_NETDEV" vs "LWD" is a bit
>> confusing?
>
> Agreed, my old eyes can't discern them at a distance :-)


agree.

mix of LWT_NETDEV and LWD can get confusing.

LWT already stands for Light Weight Tunnel...,
this can only be LWD or  LWN ;)....if people don't confuse it with
some weekly news device :)

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH RFC net-next 3/6] net: Introduce IFF_LWT_NETDEV flag
  2017-05-08 20:11     ` David Miller
  2017-05-08 21:37       ` Roopa Prabhu
@ 2017-05-09  0:57       ` David Ahern
  2017-05-09  5:04         ` Roopa Prabhu
  1 sibling, 1 reply; 17+ messages in thread
From: David Ahern @ 2017-05-09  0:57 UTC (permalink / raw)
  To: David Miller, johannes; +Cc: netdev, roopa, f.fainelli, nicolas.dichtel

On 5/8/17 1:11 PM, David Miller wrote:
> From: Johannes Berg <johannes@sipsolutions.net>
> Date: Mon, 08 May 2017 10:55:12 +0200
> 
>>
>>> +static inline bool netif_is_lwd(struct net_device *dev)
>>> +{
>>> +	return !!(dev->priv_flags & IFF_LWT_NETDEV);
>>> +}
>>
>> Am I the only one who thinks that this "LWT_NETDEV" vs "LWD" is a bit
>> confusing?
> 
> Agreed, my old eyes can't discern them at a distance :-)
> 

perhaps it is the tiny font your old eyes are having trouble with :-)

I am fine with Johannes' suggestion -- just spell it out:
    netif_is_lwt_netdev

where lwt = LightWeighT

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH RFC net-next 3/6] net: Introduce IFF_LWT_NETDEV flag
  2017-05-09  0:57       ` David Ahern
@ 2017-05-09  5:04         ` Roopa Prabhu
  0 siblings, 0 replies; 17+ messages in thread
From: Roopa Prabhu @ 2017-05-09  5:04 UTC (permalink / raw)
  To: David Ahern
  Cc: David Miller, Johannes Berg, netdev, Florian Fainelli, Nicolas Dichtel

On Mon, May 8, 2017 at 5:57 PM, David Ahern <dsahern@gmail.com> wrote:
> On 5/8/17 1:11 PM, David Miller wrote:
>> From: Johannes Berg <johannes@sipsolutions.net>
>> Date: Mon, 08 May 2017 10:55:12 +0200
>>
>>>
>>>> +static inline bool netif_is_lwd(struct net_device *dev)
>>>> +{
>>>> +   return !!(dev->priv_flags & IFF_LWT_NETDEV);
>>>> +}
>>>
>>> Am I the only one who thinks that this "LWT_NETDEV" vs "LWD" is a bit
>>> confusing?
>>
>> Agreed, my old eyes can't discern them at a distance :-)
>>
>
> perhaps it is the tiny font your old eyes are having trouble with :-)
>
> I am fine with Johannes' suggestion -- just spell it out:
>     netif_is_lwt_netdev
>
> where lwt = LightWeighT

makes sense...but this does sound like a 'light weight tunnel
netdevice' though.....just cause 'LWT' already expands to 'light
weight tunnel'

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH RFC net-next 0/6] net: reducing memory footprint of network devices
  2017-05-08 17:35 ` [PATCH RFC net-next 0/6] net: reducing memory footprint of network devices Florian Fainelli
@ 2017-05-09  9:50   ` Nicolas Dichtel
  2017-05-09 15:42     ` David Ahern
  0 siblings, 1 reply; 17+ messages in thread
From: Nicolas Dichtel @ 2017-05-09  9:50 UTC (permalink / raw)
  To: Florian Fainelli, David Ahern, netdev; +Cc: roopa

Le 08/05/2017 à 19:35, Florian Fainelli a écrit :
> On 05/06/2017 09:07 AM, David Ahern wrote:
>> As I have mentioned many times[1], at ~43+kB per instance the use of
>> net_devices does not scale for deployments needing 10,000+ devices. At
>> netconf 1.2 there was a discussion about using a net_device_common for
>> the minimal set of common attributes with other structs built on top of
>> that one for "full" devices. It provided a means for the code to know
>> "non-standard" net_devices. Conceptually, that approach has its merits
>> but it is not practical given the sweeping changes required to the code
>> base. More importantly though struct net_device is not the problem; it
>> weighs in at less than 2kB so reorganizing the code base around a
>> refactored net_device is not going to solve the problem. The primary
>> issue is all of the initializations done *because* it is a struct
>> net_device -- kobject and sysfs and the protocols (e.g., ipv4, ipv6,
>> mpls, neighbors).
>>
>> So, how do you keep the desired attributes of a net device -- network
>> addresses, xmit function, qdisc, netfilter rules, tcpdump -- while
>> lowering the overhead of a net_device instance and without sweeping
>> changes across net/ and drivers/net/?
>>
>> This patch set introduces the concept of labeling net_devices as
>> "lightweight", first mentioned at netdev 1.1 [1]. Users have to opt
>> in to lightweight devices by passing a new attribute, IFLA_LWT_NETDEV,
>> in the new link request. This lightweight tag is meant for virtual
>> devices such as vlan, vrf, vti, and dummy where the user expects to
>> create a lot of them and does not want the duplication of resources.
>> Each device type can always opt out of a lightweight label if necessary
>> by failing device creates.
>>
>> Labeling a virtual device as "lightweight" reduces the footprint for
>> device creation from ~43kB to ~6kB. That reduction in memory is obtained
>> by:
>> 1. no entry in sysfs
>>    - kobject in net_device.device is not initialized
>>
>> 2. no entry in procfs
>>    - no sysctl option for these devices
>>
>> 3. deferred ipv4, ipv6, mpls initialization
>>    - network layer must be enabled before an address can be assigned
>>      or mpls labels can be processed
>>    - enables what Florian called L2 only devices [2]
>>
>> Once the core premise of a lightweight device is accepted, follow on
>> patches can reduce the overhead of network initializations. e.g.,
>>
>> 1. remove devconf per device (ipv4 and ipv6)
>>    - lightweight devices use the default settings rather than replicate
>>      the same data for each device
>>
>> 2. reduce / remove / opt out of snmp mibs
>>    - snmp6_alloc_dev and icmpv6msg_mib_device specifically is a heavy
>>      hitter
>>
>> Patches can also be found here:
>>     https://github.com/dsahern/linux lwt-dev-rfc
>>
>> And iproute2 here:
>>     https://github.com/dsahern/iproute2 lwt-dev
>>
>> Example:
>>     ip li add foo lwd type vrf table 123
>>
>> - creates VRF device 'foo' as a lightweight netdevice.
> 
> This is really looking nice, thanks for posting this patch series! The
> only submission wide comment I have is that the flag is named
> IFF_LWT_NETDEV whereas the helper that checks for it is named
> netif_is_lwd() so we should reconcile the two. Since there is an
> existing lightweight tunnel infrastructure already, maybe using
> IFF_LWD_NETDEV (or just IFF_LWD) would be good enough here?

Yep, thank you for the series, it also looks good to me.
I also vote for the IFF_LWD_NETDEV or IFF_LWD to avoid confusion with
lightweight tunnel and to be consistent with it (lightweight was abbreviated lw,
not lwt ;-)).

Your initial patch tried to make those interfaces transparent, this is not the
case anymore here. It would probably be useful to be able to filter those
interfaces in the kernel during a dump.

Regards,
Nicolas

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH RFC net-next 0/6] net: reducing memory footprint of network devices
  2017-05-09  9:50   ` Nicolas Dichtel
@ 2017-05-09 15:42     ` David Ahern
  0 siblings, 0 replies; 17+ messages in thread
From: David Ahern @ 2017-05-09 15:42 UTC (permalink / raw)
  To: nicolas.dichtel, Florian Fainelli, netdev; +Cc: roopa

On 5/9/17 2:50 AM, Nicolas Dichtel wrote:
> Your initial patch tried to make those interfaces transparent, this is not the
> case anymore here. It would probably be useful to be able to filter those
> interfaces in the kernel during a dump.

The earlier email was for hidden devices; the intent there is to hide
certain devices (e.g., switch control netdevs) from user dumps by default.

Adding an attribute at create time such as IFF_INVISIBLE for such
devices would be a follow on to this set - but leveraging the same
sysctl and sysfs bypasses.

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2017-05-09 15:42 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-05-06 16:07 [PATCH RFC net-next 0/6] net: reducing memory footprint of network devices David Ahern
2017-05-06 16:07 ` [PATCH RFC net-next 1/6] net: Add accessor for kboject in a net_device David Ahern
2017-05-06 16:07 ` [PATCH RFC net-next 2/6] net: Add flags argument to alloc_netdev_mqs David Ahern
2017-05-06 16:07 ` [PATCH RFC net-next 3/6] net: Introduce IFF_LWT_NETDEV flag David Ahern
2017-05-08  8:55   ` Johannes Berg
2017-05-08 20:11     ` David Miller
2017-05-08 21:37       ` Roopa Prabhu
2017-05-09  0:57       ` David Ahern
2017-05-09  5:04         ` Roopa Prabhu
2017-05-06 16:07 ` [PATCH RFC net-next 4/6] net: Do not intialize kobject for lightweight netdevs David Ahern
2017-05-08 17:26   ` Florian Fainelli
2017-05-06 16:07 ` [PATCH RFC net-next 5/6] net: Delay initializations for lightweight devices David Ahern
2017-05-08 17:31   ` Florian Fainelli
2017-05-06 16:07 ` [PATCH RFC net-next 6/6] net: add uapi for creating " David Ahern
2017-05-08 17:35 ` [PATCH RFC net-next 0/6] net: reducing memory footprint of network devices Florian Fainelli
2017-05-09  9:50   ` Nicolas Dichtel
2017-05-09 15:42     ` David Ahern

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.