All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH net-next 0/5] mpls: Behaviour-changing improvements
@ 2015-03-19 21:32 Robert Shearman
  2015-03-19 21:32 ` [PATCH net-next 1/5] mpls: Use definition for reserved label checks Robert Shearman
                   ` (5 more replies)
  0 siblings, 6 replies; 68+ messages in thread
From: Robert Shearman @ 2015-03-19 21:32 UTC (permalink / raw)
  To: davem; +Cc: netdev, Robert Shearman

This series consists of several small changes to make it easier to
understand the code, along with security and RFC-compliance
changes. These are important to consider before userspace begins
relying on the previous behaviour.

Robert Shearman (5):
  mpls: Use definition for reserved label checks
  mpls: Remove incorrect PHP comment
  mpls: Differentiate implicit-null and unlabeled neighbours
  mpls: Per-device enabling of packet forwarding
  mpls: Allow payload type to be associated with label routes

 Documentation/networking/mpls-sysctl.txt |   9 ++
 include/linux/netdevice.h                |   4 +
 net/mpls/af_mpls.c                       | 242 +++++++++++++++++++++++++------
 net/mpls/internal.h                      |   7 +
 4 files changed, 215 insertions(+), 47 deletions(-)

-- 
2.1.4

^ permalink raw reply	[flat|nested] 68+ messages in thread

* [PATCH net-next 1/5] mpls: Use definition for reserved label checks
  2015-03-19 21:32 [PATCH net-next 0/5] mpls: Behaviour-changing improvements Robert Shearman
@ 2015-03-19 21:32 ` Robert Shearman
  2015-03-20  0:41   ` Eric W. Biederman
  2015-03-19 21:32 ` [PATCH net-next 2/5] mpls: Remove incorrect PHP comment Robert Shearman
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 68+ messages in thread
From: Robert Shearman @ 2015-03-19 21:32 UTC (permalink / raw)
  To: davem; +Cc: netdev, Robert Shearman, Eric W. Biederman

In multiple locations there are checks for whether the label in hand
is a reserved label or not using the arbritray value of 16. Factor
this out into a #define for better maintainability and for
documentation.

Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Robert Shearman <rshearma@brocade.com>
---
 net/mpls/af_mpls.c  | 20 ++++++++++----------
 net/mpls/internal.h |  1 +
 2 files changed, 11 insertions(+), 10 deletions(-)

diff --git a/net/mpls/af_mpls.c b/net/mpls/af_mpls.c
index db8a2ea..0d6763a 100644
--- a/net/mpls/af_mpls.c
+++ b/net/mpls/af_mpls.c
@@ -276,7 +276,7 @@ static void mpls_notify_route(struct net *net, unsigned index,
 	struct mpls_route *rt = new ? new : old;
 	unsigned nlm_flags = (old && new) ? NLM_F_REPLACE : 0;
 	/* Ignore reserved labels for now */
-	if (rt && (index >= 16))
+	if (rt && (index >= LABEL_FIRST_UNRESERVED))
 		rtmsg_lfib(event, index, rt, nlh, net, portid, nlm_flags);
 }
 
@@ -310,7 +310,7 @@ static unsigned find_free_label(struct net *net)
 
 	platform_label = rtnl_dereference(net->mpls.platform_label);
 	platform_labels = net->mpls.platform_labels;
-	for (index = 16; index < platform_labels; index++) {
+	for (index = LABEL_FIRST_UNRESERVED; index < platform_labels; index++) {
 		if (!rtnl_dereference(platform_label[index]))
 			return index;
 	}
@@ -335,8 +335,8 @@ static int mpls_route_add(struct mpls_route_config *cfg)
 		index = find_free_label(net);
 	}
 
-	/* The first 16 labels are reserved, and may not be set */
-	if (index < 16)
+	/* Reserved labels may not be set */
+	if (index < LABEL_FIRST_UNRESERVED)
 		goto errout;
 
 	/* The full 20 bit range may not be supported. */
@@ -413,8 +413,8 @@ static int mpls_route_del(struct mpls_route_config *cfg)
 
 	index = cfg->rc_label;
 
-	/* The first 16 labels are reserved, and may not be removed */
-	if (index < 16)
+	/* Reserved labels may not be removed */
+	if (index < LABEL_FIRST_UNRESERVED)
 		goto errout;
 
 	/* The full 20 bit range may not be supported */
@@ -610,8 +610,8 @@ static int rtm_to_route_config(struct sk_buff *skb,  struct nlmsghdr *nlh,
 					   &cfg->rc_label))
 				goto errout;
 
-			/* The first 16 labels are reserved, and may not be set */
-			if (cfg->rc_label < 16)
+			/* Reserved labels may not be set */
+			if (cfg->rc_label < LABEL_FIRST_UNRESERVED)
 				goto errout;
 
 			break;
@@ -736,8 +736,8 @@ static int mpls_dump_routes(struct sk_buff *skb, struct netlink_callback *cb)
 	ASSERT_RTNL();
 
 	index = cb->args[0];
-	if (index < 16)
-		index = 16;
+	if (index < LABEL_FIRST_UNRESERVED)
+		index = LABEL_FIRST_UNRESERVED;
 
 	platform_label = rtnl_dereference(net->mpls.platform_label);
 	platform_labels = net->mpls.platform_labels;
diff --git a/net/mpls/internal.h b/net/mpls/internal.h
index fb6de92..d06dff9 100644
--- a/net/mpls/internal.h
+++ b/net/mpls/internal.h
@@ -9,6 +9,7 @@
 #define LABEL_GAL			13 /* RFC5586 */
 #define LABEL_OAM_ALERT			14 /* RFC3429 */
 #define LABEL_EXTENSION			15 /* RFC7274 */
+#define LABEL_FIRST_UNRESERVED		16 /* RFC7274 */
 
 
 struct mpls_shim_hdr {
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH net-next 2/5] mpls: Remove incorrect PHP comment
  2015-03-19 21:32 [PATCH net-next 0/5] mpls: Behaviour-changing improvements Robert Shearman
  2015-03-19 21:32 ` [PATCH net-next 1/5] mpls: Use definition for reserved label checks Robert Shearman
@ 2015-03-19 21:32 ` Robert Shearman
  2015-03-19 21:32 ` [PATCH net-next 3/5] mpls: Differentiate implicit-null and unlabeled neighbours Robert Shearman
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 68+ messages in thread
From: Robert Shearman @ 2015-03-19 21:32 UTC (permalink / raw)
  To: davem; +Cc: netdev, Robert Shearman, Eric W. Biederman

Popping the last label on the stack does not necessarily imply
performing penultimate hop popping. There is no reason why this
couldn't be the last hop in the network, so remove the comment.

Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Robert Shearman <rshearma@brocade.com>
---
 net/mpls/af_mpls.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/net/mpls/af_mpls.c b/net/mpls/af_mpls.c
index 0d6763a..bf3459a 100644
--- a/net/mpls/af_mpls.c
+++ b/net/mpls/af_mpls.c
@@ -199,7 +199,6 @@ static int mpls_forward(struct sk_buff *skb, struct net_device *dev,
 	skb->protocol = htons(ETH_P_MPLS_UC);
 
 	if (unlikely(!new_header_size && dec.bos)) {
-		/* Penultimate hop popping */
 		if (!mpls_egress(rt, skb, dec))
 			goto drop;
 	} else {
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH net-next 3/5] mpls: Differentiate implicit-null and unlabeled neighbours
  2015-03-19 21:32 [PATCH net-next 0/5] mpls: Behaviour-changing improvements Robert Shearman
  2015-03-19 21:32 ` [PATCH net-next 1/5] mpls: Use definition for reserved label checks Robert Shearman
  2015-03-19 21:32 ` [PATCH net-next 2/5] mpls: Remove incorrect PHP comment Robert Shearman
@ 2015-03-19 21:32 ` Robert Shearman
  2015-03-19 21:32 ` [PATCH net-next 4/5] mpls: Per-device enabling of packet forwarding Robert Shearman
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 68+ messages in thread
From: Robert Shearman @ 2015-03-19 21:32 UTC (permalink / raw)
  To: davem; +Cc: netdev, Robert Shearman, Eric W. Biederman

The control plane can advertise labels for neighbours that don't have
an outgoing label. RFC 3032 s3.22 states that either the remaining
labels should be popped (if the control plane can determine that it's
safe to do so, which in light of MPLS-VPN, RFC 4364, is never the case
now) or that the packet should be discarded.

Therefore, if the peer is unlabeled and the last label wasn't popped
then drop the packet. The peer being unlabeled is signalled by an
empty label stack. However, implicit-null still needs to be supported
(i.e. penultimate hop popping) where the incoming label is popped and
no labels are put on and the packet can still go out labeled with the
unpopped part of the stack. This is achieved by the control plane
specifying a label stack consisting of the single special
implicit-null value.

Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Robert Shearman <rshearma@brocade.com>
---
 net/mpls/af_mpls.c | 21 +++++++++++++++++----
 1 file changed, 17 insertions(+), 4 deletions(-)

diff --git a/net/mpls/af_mpls.c b/net/mpls/af_mpls.c
index bf3459a..e3586a7 100644
--- a/net/mpls/af_mpls.c
+++ b/net/mpls/af_mpls.c
@@ -28,7 +28,8 @@ struct mpls_route { /* next hop label forwarding entry */
 	struct rcu_head		rt_rcu;
 	u32			rt_label[MAX_NEW_LABELS];
 	u8			rt_protocol; /* routing protocol that set this entry */
-	u8			rt_labels;
+	u8                      rt_unlabeled : 1;
+	u8			rt_labels : 7;
 	u8			rt_via_alen;
 	u8			rt_via_table;
 	u8			rt_via[0];
@@ -201,6 +202,11 @@ static int mpls_forward(struct sk_buff *skb, struct net_device *dev,
 	if (unlikely(!new_header_size && dec.bos)) {
 		if (!mpls_egress(rt, skb, dec))
 			goto drop;
+	} else if (rt->rt_unlabeled) {
+		/* Labeled traffic destined to unlabeled peer should
+		 * be discarded
+		 */
+		goto drop;
 	} else {
 		bool bos;
 		int i;
@@ -385,9 +391,16 @@ static int mpls_route_add(struct mpls_route_config *cfg)
 	if (!rt)
 		goto errout;
 
-	rt->rt_labels = cfg->rc_output_labels;
-	for (i = 0; i < rt->rt_labels; i++)
-		rt->rt_label[i] = cfg->rc_output_label[i];
+	if (cfg->rc_output_labels == 1 &&
+	    cfg->rc_output_label[0] == LABEL_IMPLICIT_NULL) {
+		rt->rt_labels = 0;
+	} else {
+		rt->rt_labels = cfg->rc_output_labels;
+		for (i = 0; i < rt->rt_labels; i++)
+			rt->rt_label[i] = cfg->rc_output_label[i];
+		if (!rt->rt_labels)
+			rt->rt_unlabeled = true;
+	}
 	rt->rt_protocol = cfg->rc_protocol;
 	RCU_INIT_POINTER(rt->rt_dev, dev);
 	rt->rt_via_table = cfg->rc_via_table;
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH net-next 4/5] mpls: Per-device enabling of packet forwarding
  2015-03-19 21:32 [PATCH net-next 0/5] mpls: Behaviour-changing improvements Robert Shearman
                   ` (2 preceding siblings ...)
  2015-03-19 21:32 ` [PATCH net-next 3/5] mpls: Differentiate implicit-null and unlabeled neighbours Robert Shearman
@ 2015-03-19 21:32 ` Robert Shearman
  2015-03-19 21:32 ` [PATCH net-next 5/5] mpls: Allow payload type to be associated with label routes Robert Shearman
  2015-03-20 15:42 ` [PATCH net-next v2 0/5] mpls: Behaviour-changing improvements Robert Shearman
  5 siblings, 0 replies; 68+ messages in thread
From: Robert Shearman @ 2015-03-19 21:32 UTC (permalink / raw)
  To: davem; +Cc: netdev, Robert Shearman, Eric W. Biederman

An MPLS network is a single trust domain where the edges must be in
control of what labels make their way into the core. The simplest way
of ensuring for the edge device to always impose the labels, and not
allow forward labeled traffic from untrusted neighbours. This is
achieved by allowing a per-device configuration of whether MPLS
traffic received over that interface should be forwarded or not.

To be secure by default, MPLS is now intially disabled on all
interfaces (except the loopback) until explicitly enabled and no
global option is provided to change the default. Whilst this differs
from other protocols (e.g. IPv6), network operators are used to
explicitly enabling MPLS forwarding on interfaces, and with the number
of links to the MPLS core typically fairly low this doesn't present
too much of a burden on operators.

Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Robert Shearman <rshearma@brocade.com>
---
 Documentation/networking/mpls-sysctl.txt |   9 +++
 include/linux/netdevice.h                |   4 ++
 net/mpls/af_mpls.c                       | 115 ++++++++++++++++++++++++++++++-
 net/mpls/internal.h                      |   6 ++
 4 files changed, 133 insertions(+), 1 deletion(-)

diff --git a/Documentation/networking/mpls-sysctl.txt b/Documentation/networking/mpls-sysctl.txt
index 639ddf0..f48772c 100644
--- a/Documentation/networking/mpls-sysctl.txt
+++ b/Documentation/networking/mpls-sysctl.txt
@@ -18,3 +18,12 @@ platform_labels - INTEGER
 
 	Possible values: 0 - 1048575
 	Default: 0
+
+conf/<interface>/forwarding - BOOL
+	Forward packets received on this interface.
+
+	If disabled, packets will be discarded without further
+	processing.
+
+	0 - disabled (default)
+	not 0 - enabled
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 76951c5..ee4ca06 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -60,6 +60,7 @@ struct phy_device;
 struct wireless_dev;
 /* 802.15.4 specific */
 struct wpan_dev;
+struct mpls_dev;
 
 void netdev_set_default_ethtool_ops(struct net_device *dev,
 				    const struct ethtool_ops *ops);
@@ -1615,6 +1616,9 @@ struct net_device {
 	void			*ax25_ptr;
 	struct wireless_dev	*ieee80211_ptr;
 	struct wpan_dev		*ieee802154_ptr;
+#if IS_ENABLED(CONFIG_MPLS_ROUTING)
+	struct mpls_dev __rcu	*mpls_ptr;
+#endif
 
 /*
  * Cache lines mostly used on receive path (including eth_type_trans())
diff --git a/net/mpls/af_mpls.c b/net/mpls/af_mpls.c
index e3586a7..14c7e76 100644
--- a/net/mpls/af_mpls.c
+++ b/net/mpls/af_mpls.c
@@ -54,6 +54,11 @@ static struct mpls_route *mpls_route_input_rcu(struct net *net, unsigned index)
 	return rt;
 }
 
+static inline struct mpls_dev *mpls_dev_get(const struct net_device *dev)
+{
+	return rcu_dereference_rtnl(dev->mpls_ptr);
+}
+
 static bool mpls_output_possible(const struct net_device *dev)
 {
 	return dev && (dev->flags & IFF_UP) && netif_carrier_ok(dev);
@@ -137,6 +142,7 @@ static int mpls_forward(struct sk_buff *skb, struct net_device *dev,
 	struct mpls_route *rt;
 	struct mpls_entry_decoded dec;
 	struct net_device *out_dev;
+	struct mpls_dev *mdev;
 	unsigned int hh_len;
 	unsigned int new_header_size;
 	unsigned int mtu;
@@ -144,6 +150,10 @@ static int mpls_forward(struct sk_buff *skb, struct net_device *dev,
 
 	/* Careful this entire function runs inside of an rcu critical section */
 
+	mdev = mpls_dev_get(dev);
+	if (!mdev || !mdev->fwd_enabled)
+		goto drop;
+
 	if (skb->pkt_type != PACKET_HOST)
 		goto drop;
 
@@ -440,10 +450,96 @@ errout:
 	return err;
 }
 
+#define MPLS_PERDEV_SYSCTL_OFFSET(field)	\
+	(&((struct mpls_dev *)0)->field)
+
+static const struct ctl_table mpls_dev_table[] = {
+	{
+		.procname	= "forwarding",
+		.maxlen		= sizeof(int),
+		.mode		= 0644,
+		.proc_handler	= proc_dointvec,
+		.data		= MPLS_PERDEV_SYSCTL_OFFSET(fwd_enabled),
+	},
+	{ }
+};
+
+static int mpls_dev_sysctl_register(struct net_device *dev,
+				    struct mpls_dev *mdev)
+{
+	char path[sizeof("net/mpls/conf/") + IFNAMSIZ];
+	struct ctl_table *table;
+	int i;
+
+	table = kmemdup(&mpls_dev_table, sizeof(mpls_dev_table), GFP_KERNEL);
+	if (!table)
+		goto out;
+
+	/* Table data contains only offsets relative to the base of
+	 * the mdev at this point, so make them absolute.
+	 */
+	for (i = 0; i < ARRAY_SIZE(mpls_dev_table); i++)
+		table[i].data = (char *)mdev + (uintptr_t)table[i].data;
+
+	snprintf(path, sizeof(path), "net/mpls/conf/%s", dev->name);
+
+	mdev->sysctl = register_net_sysctl(dev_net(dev), path, table);
+	if (!mdev->sysctl)
+		goto free;
+
+	return 0;
+
+free:
+	kfree(table);
+out:
+	return -ENOBUFS;
+}
+
+static void mpls_dev_sysctl_unregister(struct mpls_dev *mdev)
+{
+	struct ctl_table *table;
+
+	table = mdev->sysctl->ctl_table_arg;
+	unregister_net_sysctl_table(mdev->sysctl);
+	kfree(table);
+}
+
+static struct mpls_dev *mpls_add_dev(struct net_device *dev)
+{
+	struct mpls_dev *mdev;
+	int err = -ENOMEM;
+
+	ASSERT_RTNL();
+
+	mdev = kzalloc(sizeof(*mdev), GFP_KERNEL);
+	if (!mdev)
+		return ERR_PTR(err);
+
+	/* Enable MPLS by default on loopback devices, since this
+	 * doesn't represent a security boundary and is required for the
+	 * lookup of inner labels for LSPs terminating on this router.
+	 */
+	if (dev->flags & IFF_LOOPBACK)
+		mdev->fwd_enabled = 1;
+
+	err = mpls_dev_sysctl_register(dev, mdev);
+	if (err)
+		goto free;
+
+	rcu_assign_pointer(dev->mpls_ptr, mdev);
+
+	return mdev;
+
+free:
+	kfree(mdev);
+	return ERR_PTR(err);
+}
+
 static void mpls_ifdown(struct net_device *dev)
 {
 	struct mpls_route __rcu **platform_label;
 	struct net *net = dev_net(dev);
+	struct mpls_dev *mdev;
 	unsigned index;
 
 	platform_label = rtnl_dereference(net->mpls.platform_label);
@@ -455,14 +551,31 @@ static void mpls_ifdown(struct net_device *dev)
 			continue;
 		rt->rt_dev = NULL;
 	}
+
+	mdev = mpls_dev_get(dev);
+	if (!mdev)
+		return;
+
+	mpls_dev_sysctl_unregister(mdev);
+
+	RCU_INIT_POINTER(dev->mpls_ptr, NULL);
+
+	kfree(mdev);
 }
 
 static int mpls_dev_notify(struct notifier_block *this, unsigned long event,
 			   void *ptr)
 {
 	struct net_device *dev = netdev_notifier_info_to_dev(ptr);
+	struct mpls_dev *mdev;
 
 	switch(event) {
+	case NETDEV_REGISTER:
+		mdev = mpls_add_dev(dev);
+		if (IS_ERR(mdev))
+			return notifier_from_errno(PTR_ERR(mdev));
+		break;
+
 	case NETDEV_UNREGISTER:
 		mpls_ifdown(dev);
 		break;
@@ -924,7 +1037,7 @@ static int mpls_platform_labels(struct ctl_table *table, int write,
 	return ret;
 }
 
-static struct ctl_table mpls_table[] = {
+static const struct ctl_table mpls_table[] = {
 	{
 		.procname	= "platform_labels",
 		.data		= NULL,
diff --git a/net/mpls/internal.h b/net/mpls/internal.h
index d06dff9..b839f5c 100644
--- a/net/mpls/internal.h
+++ b/net/mpls/internal.h
@@ -23,6 +23,12 @@ struct mpls_entry_decoded {
 	u8 bos;
 };
 
+struct mpls_dev {
+	int			fwd_enabled;
+
+	struct ctl_table_header *sysctl;
+};
+
 struct sk_buff;
 
 static inline struct mpls_shim_hdr *mpls_hdr(const struct sk_buff *skb)
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH net-next 5/5] mpls: Allow payload type to be associated with label routes
  2015-03-19 21:32 [PATCH net-next 0/5] mpls: Behaviour-changing improvements Robert Shearman
                   ` (3 preceding siblings ...)
  2015-03-19 21:32 ` [PATCH net-next 4/5] mpls: Per-device enabling of packet forwarding Robert Shearman
@ 2015-03-19 21:32 ` Robert Shearman
  2015-03-20 15:42 ` [PATCH net-next v2 0/5] mpls: Behaviour-changing improvements Robert Shearman
  5 siblings, 0 replies; 68+ messages in thread
From: Robert Shearman @ 2015-03-19 21:32 UTC (permalink / raw)
  To: davem; +Cc: netdev, Robert Shearman, Eric W. Biederman

RFC 4182 s2 states that if an IPv4 Explicit NULL label is the only
label on the stack, then after popping the resulting packet must be
treated as a IPv4 packet and forwarded based on the IPv4 header. The
same is true for IPv6 Explicit NULL with an IPv6 packet following.

Therefore, when installing the IPv4/IPv6 Explicit NULL label routes,
add an attribute that specifies the expected payload type for use at
forwarding time for determining the type of the encapsulated packet
instead of inspecting the first nibble of the packet.

Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Robert Shearman <rshearma@brocade.com>
---
 net/mpls/af_mpls.c | 87 ++++++++++++++++++++++++++++++++++--------------------
 1 file changed, 55 insertions(+), 32 deletions(-)

diff --git a/net/mpls/af_mpls.c b/net/mpls/af_mpls.c
index 14c7e76..653bae1 100644
--- a/net/mpls/af_mpls.c
+++ b/net/mpls/af_mpls.c
@@ -23,13 +23,20 @@
 /* This maximum ha length copied from the definition of struct neighbour */
 #define MAX_VIA_ALEN (ALIGN(MAX_ADDR_LEN, sizeof(unsigned long)))
 
+enum mpls_payload_type {
+	MPT_UNSPEC, /* IPv4 or IPv6 */
+	MPT_IPV4,
+	MPT_IPV6,
+};
+
 struct mpls_route { /* next hop label forwarding entry */
 	struct net_device __rcu *rt_dev;
 	struct rcu_head		rt_rcu;
 	u32			rt_label[MAX_NEW_LABELS];
 	u8			rt_protocol; /* routing protocol that set this entry */
 	u8                      rt_unlabeled : 1;
-	u8			rt_labels : 7;
+	u8                      rt_payload_type : 3;
+	u8			rt_labels : 4;
 	u8			rt_via_alen;
 	u8			rt_via_table;
 	u8			rt_via[0];
@@ -87,19 +94,24 @@ static bool mpls_pkt_too_big(const struct sk_buff *skb, unsigned int mtu)
 	return true;
 }
 
-static bool mpls_egress(struct mpls_route *rt, struct sk_buff *skb,
-			struct mpls_entry_decoded dec)
+static enum mpls_payload_type mpls_pkt_determine_af(struct sk_buff *skb)
 {
-	/* RFC4385 and RFC5586 encode other packets in mpls such that
-	 * they don't conflict with the ip version number, making
-	 * decoding by examining the ip version correct in everything
-	 * except for the strangest cases.
-	 *
-	 * The strange cases if we choose to support them will require
-	 * manual configuration.
-	 */
-	struct iphdr *hdr4;
-	bool success = true;
+	struct iphdr *hdr4 = ip_hdr(skb);
+
+	switch (hdr4->version) {
+	case 4:
+		return MPT_IPV4;
+	case 6:
+		return MPT_IPV6;
+	}
+
+	return MPT_UNSPEC;
+}
+
+static bool mpls_bos_egress(struct mpls_route *rt, struct sk_buff *skb,
+			    struct mpls_entry_decoded dec)
+{
+	enum mpls_payload_type payload_type;
 
 	/* The IPv4 code below accesses through the IPv4 header
 	 * checksum, which is 12 bytes into the packet.
@@ -114,24 +126,31 @@ static bool mpls_egress(struct mpls_route *rt, struct sk_buff *skb,
 	if (!pskb_may_pull(skb, 12))
 		return false;
 
-	/* Use ip_hdr to find the ip protocol version */
-	hdr4 = ip_hdr(skb);
-	if (hdr4->version == 4) {
+	payload_type = rt->rt_payload_type;
+	if (payload_type == MPT_UNSPEC)
+		payload_type = mpls_pkt_determine_af(skb);
+
+	switch (payload_type) {
+	case MPT_IPV4: {
+		struct iphdr *hdr4 = ip_hdr(skb);
 		skb->protocol = htons(ETH_P_IP);
 		csum_replace2(&hdr4->check,
 			      htons(hdr4->ttl << 8),
 			      htons(dec.ttl << 8));
 		hdr4->ttl = dec.ttl;
+		return true;
 	}
-	else if (hdr4->version == 6) {
+	case MPT_IPV6: {
 		struct ipv6hdr *hdr6 = ipv6_hdr(skb);
 		skb->protocol = htons(ETH_P_IPV6);
 		hdr6->hop_limit = dec.ttl;
+		return true;
 	}
-	else
-		/* version 0 and version 1 are used by pseudo wires */
-		success = false;
-	return success;
+	case MPT_UNSPEC:
+		break;
+	}
+
+	return false;
 }
 
 static int mpls_forward(struct sk_buff *skb, struct net_device *dev,
@@ -210,7 +229,7 @@ static int mpls_forward(struct sk_buff *skb, struct net_device *dev,
 	skb->protocol = htons(ETH_P_MPLS_UC);
 
 	if (unlikely(!new_header_size && dec.bos)) {
-		if (!mpls_egress(rt, skb, dec))
+		if (!mpls_bos_egress(rt, skb, dec))
 			goto drop;
 	} else if (rt->rt_unlabeled) {
 		/* Labeled traffic destined to unlabeled peer should
@@ -253,16 +272,17 @@ static const struct nla_policy rtm_mpls_policy[RTA_MAX+1] = {
 };
 
 struct mpls_route_config {
-	u32		rc_protocol;
-	u32		rc_ifindex;
-	u16		rc_via_table;
-	u16		rc_via_alen;
-	u8		rc_via[MAX_VIA_ALEN];
-	u32		rc_label;
-	u32		rc_output_labels;
-	u32		rc_output_label[MAX_NEW_LABELS];
-	u32		rc_nlflags;
-	struct nl_info	rc_nlinfo;
+	u32			rc_protocol;
+	u32			rc_ifindex;
+	u16			rc_via_table;
+	u16			rc_via_alen;
+	u8			rc_via[MAX_VIA_ALEN];
+	u32			rc_label;
+	u32			rc_output_labels;
+	u32			rc_output_label[MAX_NEW_LABELS];
+	u32			rc_nlflags;
+	enum mpls_payload_type	rc_payload_type;
+	struct nl_info		rc_nlinfo;
 };
 
 static struct mpls_route *mpls_rt_alloc(size_t alen)
@@ -413,6 +433,7 @@ static int mpls_route_add(struct mpls_route_config *cfg)
 	}
 	rt->rt_protocol = cfg->rc_protocol;
 	RCU_INIT_POINTER(rt->rt_dev, dev);
+	rt->rt_payload_type = cfg->rc_payload_type;
 	rt->rt_via_table = cfg->rc_via_table;
 	memcpy(rt->rt_via, cfg->rc_via, cfg->rc_via_alen);
 
@@ -948,6 +969,7 @@ static int resize_platform_label_table(struct net *net, size_t limit)
 			goto nort0;
 		RCU_INIT_POINTER(rt0->rt_dev, lo);
 		rt0->rt_protocol = RTPROT_KERNEL;
+		rt0->rt_payload_type = MPT_IPV4;
 		rt0->rt_via_table = NEIGH_LINK_TABLE;
 		memcpy(rt0->rt_via, lo->dev_addr, lo->addr_len);
 	}
@@ -958,6 +980,7 @@ static int resize_platform_label_table(struct net *net, size_t limit)
 			goto nort2;
 		RCU_INIT_POINTER(rt2->rt_dev, lo);
 		rt2->rt_protocol = RTPROT_KERNEL;
+		rt2->rt_payload_type = MPT_IPV6;
 		rt2->rt_via_table = NEIGH_LINK_TABLE;
 		memcpy(rt2->rt_via, lo->dev_addr, lo->addr_len);
 	}
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 68+ messages in thread

* Re: [PATCH net-next 1/5] mpls: Use definition for reserved label checks
  2015-03-19 21:32 ` [PATCH net-next 1/5] mpls: Use definition for reserved label checks Robert Shearman
@ 2015-03-20  0:41   ` Eric W. Biederman
  2015-03-20 14:12     ` Robert Shearman
  0 siblings, 1 reply; 68+ messages in thread
From: Eric W. Biederman @ 2015-03-20  0:41 UTC (permalink / raw)
  To: Robert Shearman; +Cc: davem, netdev

Robert Shearman <rshearma@brocade.com> writes:

> In multiple locations there are checks for whether the label in hand
> is a reserved label or not using the arbritray value of 16. Factor
> this out into a #define for better maintainability and for
> documentation.
>
> Cc: "Eric W. Biederman" <ebiederm@xmission.com>
> Signed-off-by: Robert Shearman <rshearma@brocade.com>
> ---
>  net/mpls/af_mpls.c  | 20 ++++++++++----------
>  net/mpls/internal.h |  1 +
>  2 files changed, 11 insertions(+), 10 deletions(-)
>
> diff --git a/net/mpls/af_mpls.c b/net/mpls/af_mpls.c
> index db8a2ea..0d6763a 100644
> --- a/net/mpls/af_mpls.c
> +++ b/net/mpls/af_mpls.c
> @@ -276,7 +276,7 @@ static void mpls_notify_route(struct net *net, unsigned index,
>  	struct mpls_route *rt = new ? new : old;
>  	unsigned nlm_flags = (old && new) ? NLM_F_REPLACE : 0;
>  	/* Ignore reserved labels for now */
> -	if (rt && (index >= 16))
> +	if (rt && (index >= LABEL_FIRST_UNRESERVED))
>  		rtmsg_lfib(event, index, rt, nlh, net, portid, nlm_flags);
>  }
>  
> @@ -310,7 +310,7 @@ static unsigned find_free_label(struct net *net)
>  
>  	platform_label = rtnl_dereference(net->mpls.platform_label);
>  	platform_labels = net->mpls.platform_labels;
> -	for (index = 16; index < platform_labels; index++) {
> +	for (index = LABEL_FIRST_UNRESERVED; index < platform_labels; index++) {
>  		if (!rtnl_dereference(platform_label[index]))
>  			return index;
>  	}
> @@ -335,8 +335,8 @@ static int mpls_route_add(struct mpls_route_config *cfg)
>  		index = find_free_label(net);
>  	}
>  
> -	/* The first 16 labels are reserved, and may not be set */
> -	if (index < 16)
> +	/* Reserved labels may not be set */
> +	if (index < LABEL_FIRST_UNRESERVED)
>  		goto errout;
>  
>  	/* The full 20 bit range may not be supported. */
> @@ -413,8 +413,8 @@ static int mpls_route_del(struct mpls_route_config *cfg)
>  
>  	index = cfg->rc_label;
>  
> -	/* The first 16 labels are reserved, and may not be removed */
> -	if (index < 16)
> +	/* Reserved labels may not be removed */
> +	if (index < LABEL_FIRST_UNRESERVED)
>  		goto errout;
>  
>  	/* The full 20 bit range may not be supported */
> @@ -610,8 +610,8 @@ static int rtm_to_route_config(struct sk_buff *skb,  struct nlmsghdr *nlh,
>  					   &cfg->rc_label))
>  				goto errout;
>  
> -			/* The first 16 labels are reserved, and may not be set */
> -			if (cfg->rc_label < 16)
> +			/* Reserved labels may not be set */
> +			if (cfg->rc_label < LABEL_FIRST_UNRESERVED)
>  				goto errout;
>  
>  			break;
> @@ -736,8 +736,8 @@ static int mpls_dump_routes(struct sk_buff *skb, struct netlink_callback *cb)
>  	ASSERT_RTNL();
>  
>  	index = cb->args[0];
> -	if (index < 16)
> -		index = 16;
> +	if (index < LABEL_FIRST_UNRESERVED)
> +		index = LABEL_FIRST_UNRESERVED;
>  
>  	platform_label = rtnl_dereference(net->mpls.platform_label);
>  	platform_labels = net->mpls.platform_labels;
> diff --git a/net/mpls/internal.h b/net/mpls/internal.h
> index fb6de92..d06dff9 100644
> --- a/net/mpls/internal.h
> +++ b/net/mpls/internal.h
> @@ -9,6 +9,7 @@
>  #define LABEL_GAL			13 /* RFC5586 */
>  #define LABEL_OAM_ALERT			14 /* RFC3429 */
>  #define LABEL_EXTENSION			15 /* RFC7274 */
> +#define LABEL_FIRST_UNRESERVED		16 /* RFC7274 */

This should reference RFC3032 not RFC7274 as RFC3032 is what defines
the first 16 labels as reserved.

Eric

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH net-next 1/5] mpls: Use definition for reserved label checks
  2015-03-20  0:41   ` Eric W. Biederman
@ 2015-03-20 14:12     ` Robert Shearman
  0 siblings, 0 replies; 68+ messages in thread
From: Robert Shearman @ 2015-03-20 14:12 UTC (permalink / raw)
  To: Eric W. Biederman; +Cc: davem, netdev

On 20/03/15 00:41, Eric W. Biederman wrote:
> Robert Shearman <rshearma@brocade.com> writes:
>> diff --git a/net/mpls/internal.h b/net/mpls/internal.h
>> index fb6de92..d06dff9 100644
>> --- a/net/mpls/internal.h
>> +++ b/net/mpls/internal.h
>> @@ -9,6 +9,7 @@
>>   #define LABEL_GAL			13 /* RFC5586 */
>>   #define LABEL_OAM_ALERT			14 /* RFC3429 */
>>   #define LABEL_EXTENSION			15 /* RFC7274 */
>> +#define LABEL_FIRST_UNRESERVED		16 /* RFC7274 */
>
> This should reference RFC3032 not RFC7274 as RFC3032 is what defines
> the first 16 labels as reserved.

Thanks - I'll fix that and resend the series.

Rob

^ permalink raw reply	[flat|nested] 68+ messages in thread

* [PATCH net-next v2 0/5] mpls: Behaviour-changing improvements
  2015-03-19 21:32 [PATCH net-next 0/5] mpls: Behaviour-changing improvements Robert Shearman
                   ` (4 preceding siblings ...)
  2015-03-19 21:32 ` [PATCH net-next 5/5] mpls: Allow payload type to be associated with label routes Robert Shearman
@ 2015-03-20 15:42 ` Robert Shearman
  2015-03-20 15:42   ` [PATCH net-next v2 1/5] mpls: Use definition for reserved label checks Robert Shearman
                     ` (5 more replies)
  5 siblings, 6 replies; 68+ messages in thread
From: Robert Shearman @ 2015-03-20 15:42 UTC (permalink / raw)
  To: davem; +Cc: netdev, Robert Shearman

Updated to reference the correct RFC in the first patch.

This series consists of several small changes to make it easier to
understand the code, along with security and RFC-compliance
changes. These are important to consider before userspace begins
relying on the previous behaviour.

Robert Shearman (5):
  mpls: Use definition for reserved label checks
  mpls: Remove incorrect PHP comment
  mpls: Differentiate implicit-null and unlabeled neighbours
  mpls: Per-device enabling of packet forwarding
  mpls: Allow payload type to be associated with label routes

 Documentation/networking/mpls-sysctl.txt |   9 ++
 include/linux/netdevice.h                |   4 +
 net/mpls/af_mpls.c                       | 242 +++++++++++++++++++++++++------
 net/mpls/internal.h                      |   7 +
 4 files changed, 215 insertions(+), 47 deletions(-)

-- 
2.1.4

^ permalink raw reply	[flat|nested] 68+ messages in thread

* [PATCH net-next v2 1/5] mpls: Use definition for reserved label checks
  2015-03-20 15:42 ` [PATCH net-next v2 0/5] mpls: Behaviour-changing improvements Robert Shearman
@ 2015-03-20 15:42   ` Robert Shearman
  2015-03-22 19:09     ` Eric W. Biederman
  2015-03-20 15:42   ` [PATCH net-next v2 2/5] mpls: Remove incorrect PHP comment Robert Shearman
                     ` (4 subsequent siblings)
  5 siblings, 1 reply; 68+ messages in thread
From: Robert Shearman @ 2015-03-20 15:42 UTC (permalink / raw)
  To: davem; +Cc: netdev, Robert Shearman, Eric W. Biederman

In multiple locations there are checks for whether the label in hand
is a reserved label or not using the arbritray value of 16. Factor
this out into a #define for better maintainability and for
documentation.

Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Robert Shearman <rshearma@brocade.com>
---
 net/mpls/af_mpls.c  | 20 ++++++++++----------
 net/mpls/internal.h |  1 +
 2 files changed, 11 insertions(+), 10 deletions(-)

diff --git a/net/mpls/af_mpls.c b/net/mpls/af_mpls.c
index db8a2ea..0d6763a 100644
--- a/net/mpls/af_mpls.c
+++ b/net/mpls/af_mpls.c
@@ -276,7 +276,7 @@ static void mpls_notify_route(struct net *net, unsigned index,
 	struct mpls_route *rt = new ? new : old;
 	unsigned nlm_flags = (old && new) ? NLM_F_REPLACE : 0;
 	/* Ignore reserved labels for now */
-	if (rt && (index >= 16))
+	if (rt && (index >= LABEL_FIRST_UNRESERVED))
 		rtmsg_lfib(event, index, rt, nlh, net, portid, nlm_flags);
 }
 
@@ -310,7 +310,7 @@ static unsigned find_free_label(struct net *net)
 
 	platform_label = rtnl_dereference(net->mpls.platform_label);
 	platform_labels = net->mpls.platform_labels;
-	for (index = 16; index < platform_labels; index++) {
+	for (index = LABEL_FIRST_UNRESERVED; index < platform_labels; index++) {
 		if (!rtnl_dereference(platform_label[index]))
 			return index;
 	}
@@ -335,8 +335,8 @@ static int mpls_route_add(struct mpls_route_config *cfg)
 		index = find_free_label(net);
 	}
 
-	/* The first 16 labels are reserved, and may not be set */
-	if (index < 16)
+	/* Reserved labels may not be set */
+	if (index < LABEL_FIRST_UNRESERVED)
 		goto errout;
 
 	/* The full 20 bit range may not be supported. */
@@ -413,8 +413,8 @@ static int mpls_route_del(struct mpls_route_config *cfg)
 
 	index = cfg->rc_label;
 
-	/* The first 16 labels are reserved, and may not be removed */
-	if (index < 16)
+	/* Reserved labels may not be removed */
+	if (index < LABEL_FIRST_UNRESERVED)
 		goto errout;
 
 	/* The full 20 bit range may not be supported */
@@ -610,8 +610,8 @@ static int rtm_to_route_config(struct sk_buff *skb,  struct nlmsghdr *nlh,
 					   &cfg->rc_label))
 				goto errout;
 
-			/* The first 16 labels are reserved, and may not be set */
-			if (cfg->rc_label < 16)
+			/* Reserved labels may not be set */
+			if (cfg->rc_label < LABEL_FIRST_UNRESERVED)
 				goto errout;
 
 			break;
@@ -736,8 +736,8 @@ static int mpls_dump_routes(struct sk_buff *skb, struct netlink_callback *cb)
 	ASSERT_RTNL();
 
 	index = cb->args[0];
-	if (index < 16)
-		index = 16;
+	if (index < LABEL_FIRST_UNRESERVED)
+		index = LABEL_FIRST_UNRESERVED;
 
 	platform_label = rtnl_dereference(net->mpls.platform_label);
 	platform_labels = net->mpls.platform_labels;
diff --git a/net/mpls/internal.h b/net/mpls/internal.h
index fb6de92..5732283 100644
--- a/net/mpls/internal.h
+++ b/net/mpls/internal.h
@@ -9,6 +9,7 @@
 #define LABEL_GAL			13 /* RFC5586 */
 #define LABEL_OAM_ALERT			14 /* RFC3429 */
 #define LABEL_EXTENSION			15 /* RFC7274 */
+#define LABEL_FIRST_UNRESERVED		16 /* RFC3032 */
 
 
 struct mpls_shim_hdr {
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH net-next v2 2/5] mpls: Remove incorrect PHP comment
  2015-03-20 15:42 ` [PATCH net-next v2 0/5] mpls: Behaviour-changing improvements Robert Shearman
  2015-03-20 15:42   ` [PATCH net-next v2 1/5] mpls: Use definition for reserved label checks Robert Shearman
@ 2015-03-20 15:42   ` Robert Shearman
  2015-03-22 19:12     ` Eric W. Biederman
  2015-03-20 15:42   ` [PATCH net-next v2 3/5] mpls: Differentiate implicit-null and unlabeled neighbours Robert Shearman
                     ` (3 subsequent siblings)
  5 siblings, 1 reply; 68+ messages in thread
From: Robert Shearman @ 2015-03-20 15:42 UTC (permalink / raw)
  To: davem; +Cc: netdev, Robert Shearman, Eric W. Biederman

Popping the last label on the stack does not necessarily imply
performing penultimate hop popping. There is no reason why this
couldn't be the last hop in the network, so remove the comment.

Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Robert Shearman <rshearma@brocade.com>
---
 net/mpls/af_mpls.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/net/mpls/af_mpls.c b/net/mpls/af_mpls.c
index 0d6763a..bf3459a 100644
--- a/net/mpls/af_mpls.c
+++ b/net/mpls/af_mpls.c
@@ -199,7 +199,6 @@ static int mpls_forward(struct sk_buff *skb, struct net_device *dev,
 	skb->protocol = htons(ETH_P_MPLS_UC);
 
 	if (unlikely(!new_header_size && dec.bos)) {
-		/* Penultimate hop popping */
 		if (!mpls_egress(rt, skb, dec))
 			goto drop;
 	} else {
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH net-next v2 3/5] mpls: Differentiate implicit-null and unlabeled neighbours
  2015-03-20 15:42 ` [PATCH net-next v2 0/5] mpls: Behaviour-changing improvements Robert Shearman
  2015-03-20 15:42   ` [PATCH net-next v2 1/5] mpls: Use definition for reserved label checks Robert Shearman
  2015-03-20 15:42   ` [PATCH net-next v2 2/5] mpls: Remove incorrect PHP comment Robert Shearman
@ 2015-03-20 15:42   ` Robert Shearman
  2015-03-22 19:49     ` Eric W. Biederman
  2015-03-20 15:42   ` [PATCH net-next v2 4/5] mpls: Per-device enabling of packet forwarding Robert Shearman
                     ` (2 subsequent siblings)
  5 siblings, 1 reply; 68+ messages in thread
From: Robert Shearman @ 2015-03-20 15:42 UTC (permalink / raw)
  To: davem; +Cc: netdev, Robert Shearman, Eric W. Biederman

The control plane can advertise labels for neighbours that don't have
an outgoing label. RFC 3032 s3.22 states that either the remaining
labels should be popped (if the control plane can determine that it's
safe to do so, which in light of MPLS-VPN, RFC 4364, is never the case
now) or that the packet should be discarded.

Therefore, if the peer is unlabeled and the last label wasn't popped
then drop the packet. The peer being unlabeled is signalled by an
empty label stack. However, implicit-null still needs to be supported
(i.e. penultimate hop popping) where the incoming label is popped and
no labels are put on and the packet can still go out labeled with the
unpopped part of the stack. This is achieved by the control plane
specifying a label stack consisting of the single special
implicit-null value.

Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Robert Shearman <rshearma@brocade.com>
---
 net/mpls/af_mpls.c | 21 +++++++++++++++++----
 1 file changed, 17 insertions(+), 4 deletions(-)

diff --git a/net/mpls/af_mpls.c b/net/mpls/af_mpls.c
index bf3459a..e3586a7 100644
--- a/net/mpls/af_mpls.c
+++ b/net/mpls/af_mpls.c
@@ -28,7 +28,8 @@ struct mpls_route { /* next hop label forwarding entry */
 	struct rcu_head		rt_rcu;
 	u32			rt_label[MAX_NEW_LABELS];
 	u8			rt_protocol; /* routing protocol that set this entry */
-	u8			rt_labels;
+	u8                      rt_unlabeled : 1;
+	u8			rt_labels : 7;
 	u8			rt_via_alen;
 	u8			rt_via_table;
 	u8			rt_via[0];
@@ -201,6 +202,11 @@ static int mpls_forward(struct sk_buff *skb, struct net_device *dev,
 	if (unlikely(!new_header_size && dec.bos)) {
 		if (!mpls_egress(rt, skb, dec))
 			goto drop;
+	} else if (rt->rt_unlabeled) {
+		/* Labeled traffic destined to unlabeled peer should
+		 * be discarded
+		 */
+		goto drop;
 	} else {
 		bool bos;
 		int i;
@@ -385,9 +391,16 @@ static int mpls_route_add(struct mpls_route_config *cfg)
 	if (!rt)
 		goto errout;
 
-	rt->rt_labels = cfg->rc_output_labels;
-	for (i = 0; i < rt->rt_labels; i++)
-		rt->rt_label[i] = cfg->rc_output_label[i];
+	if (cfg->rc_output_labels == 1 &&
+	    cfg->rc_output_label[0] == LABEL_IMPLICIT_NULL) {
+		rt->rt_labels = 0;
+	} else {
+		rt->rt_labels = cfg->rc_output_labels;
+		for (i = 0; i < rt->rt_labels; i++)
+			rt->rt_label[i] = cfg->rc_output_label[i];
+		if (!rt->rt_labels)
+			rt->rt_unlabeled = true;
+	}
 	rt->rt_protocol = cfg->rc_protocol;
 	RCU_INIT_POINTER(rt->rt_dev, dev);
 	rt->rt_via_table = cfg->rc_via_table;
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH net-next v2 4/5] mpls: Per-device enabling of packet forwarding
  2015-03-20 15:42 ` [PATCH net-next v2 0/5] mpls: Behaviour-changing improvements Robert Shearman
                     ` (2 preceding siblings ...)
  2015-03-20 15:42   ` [PATCH net-next v2 3/5] mpls: Differentiate implicit-null and unlabeled neighbours Robert Shearman
@ 2015-03-20 15:42   ` Robert Shearman
  2015-03-22 20:02     ` Eric W. Biederman
  2015-03-20 15:42   ` [PATCH net-next v2 5/5] mpls: Allow payload type to be associated with label routes Robert Shearman
  2015-03-30 18:15   ` [PATCH net-next v3 0/4] mpls: Behaviour-changing improvements Robert Shearman
  5 siblings, 1 reply; 68+ messages in thread
From: Robert Shearman @ 2015-03-20 15:42 UTC (permalink / raw)
  To: davem; +Cc: netdev, Robert Shearman, Eric W. Biederman

An MPLS network is a single trust domain where the edges must be in
control of what labels make their way into the core. The simplest way
of ensuring for the edge device to always impose the labels, and not
allow forward labeled traffic from untrusted neighbours. This is
achieved by allowing a per-device configuration of whether MPLS
traffic received over that interface should be forwarded or not.

To be secure by default, MPLS is now intially disabled on all
interfaces (except the loopback) until explicitly enabled and no
global option is provided to change the default. Whilst this differs
from other protocols (e.g. IPv6), network operators are used to
explicitly enabling MPLS forwarding on interfaces, and with the number
of links to the MPLS core typically fairly low this doesn't present
too much of a burden on operators.

Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Robert Shearman <rshearma@brocade.com>
---
 Documentation/networking/mpls-sysctl.txt |   9 +++
 include/linux/netdevice.h                |   4 ++
 net/mpls/af_mpls.c                       | 115 ++++++++++++++++++++++++++++++-
 net/mpls/internal.h                      |   6 ++
 4 files changed, 133 insertions(+), 1 deletion(-)

diff --git a/Documentation/networking/mpls-sysctl.txt b/Documentation/networking/mpls-sysctl.txt
index 639ddf0..f48772c 100644
--- a/Documentation/networking/mpls-sysctl.txt
+++ b/Documentation/networking/mpls-sysctl.txt
@@ -18,3 +18,12 @@ platform_labels - INTEGER
 
 	Possible values: 0 - 1048575
 	Default: 0
+
+conf/<interface>/forwarding - BOOL
+	Forward packets received on this interface.
+
+	If disabled, packets will be discarded without further
+	processing.
+
+	0 - disabled (default)
+	not 0 - enabled
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 76951c5..ee4ca06 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -60,6 +60,7 @@ struct phy_device;
 struct wireless_dev;
 /* 802.15.4 specific */
 struct wpan_dev;
+struct mpls_dev;
 
 void netdev_set_default_ethtool_ops(struct net_device *dev,
 				    const struct ethtool_ops *ops);
@@ -1615,6 +1616,9 @@ struct net_device {
 	void			*ax25_ptr;
 	struct wireless_dev	*ieee80211_ptr;
 	struct wpan_dev		*ieee802154_ptr;
+#if IS_ENABLED(CONFIG_MPLS_ROUTING)
+	struct mpls_dev __rcu	*mpls_ptr;
+#endif
 
 /*
  * Cache lines mostly used on receive path (including eth_type_trans())
diff --git a/net/mpls/af_mpls.c b/net/mpls/af_mpls.c
index e3586a7..14c7e76 100644
--- a/net/mpls/af_mpls.c
+++ b/net/mpls/af_mpls.c
@@ -54,6 +54,11 @@ static struct mpls_route *mpls_route_input_rcu(struct net *net, unsigned index)
 	return rt;
 }
 
+static inline struct mpls_dev *mpls_dev_get(const struct net_device *dev)
+{
+	return rcu_dereference_rtnl(dev->mpls_ptr);
+}
+
 static bool mpls_output_possible(const struct net_device *dev)
 {
 	return dev && (dev->flags & IFF_UP) && netif_carrier_ok(dev);
@@ -137,6 +142,7 @@ static int mpls_forward(struct sk_buff *skb, struct net_device *dev,
 	struct mpls_route *rt;
 	struct mpls_entry_decoded dec;
 	struct net_device *out_dev;
+	struct mpls_dev *mdev;
 	unsigned int hh_len;
 	unsigned int new_header_size;
 	unsigned int mtu;
@@ -144,6 +150,10 @@ static int mpls_forward(struct sk_buff *skb, struct net_device *dev,
 
 	/* Careful this entire function runs inside of an rcu critical section */
 
+	mdev = mpls_dev_get(dev);
+	if (!mdev || !mdev->fwd_enabled)
+		goto drop;
+
 	if (skb->pkt_type != PACKET_HOST)
 		goto drop;
 
@@ -440,10 +450,96 @@ errout:
 	return err;
 }
 
+#define MPLS_PERDEV_SYSCTL_OFFSET(field)	\
+	(&((struct mpls_dev *)0)->field)
+
+static const struct ctl_table mpls_dev_table[] = {
+	{
+		.procname	= "forwarding",
+		.maxlen		= sizeof(int),
+		.mode		= 0644,
+		.proc_handler	= proc_dointvec,
+		.data		= MPLS_PERDEV_SYSCTL_OFFSET(fwd_enabled),
+	},
+	{ }
+};
+
+static int mpls_dev_sysctl_register(struct net_device *dev,
+				    struct mpls_dev *mdev)
+{
+	char path[sizeof("net/mpls/conf/") + IFNAMSIZ];
+	struct ctl_table *table;
+	int i;
+
+	table = kmemdup(&mpls_dev_table, sizeof(mpls_dev_table), GFP_KERNEL);
+	if (!table)
+		goto out;
+
+	/* Table data contains only offsets relative to the base of
+	 * the mdev at this point, so make them absolute.
+	 */
+	for (i = 0; i < ARRAY_SIZE(mpls_dev_table); i++)
+		table[i].data = (char *)mdev + (uintptr_t)table[i].data;
+
+	snprintf(path, sizeof(path), "net/mpls/conf/%s", dev->name);
+
+	mdev->sysctl = register_net_sysctl(dev_net(dev), path, table);
+	if (!mdev->sysctl)
+		goto free;
+
+	return 0;
+
+free:
+	kfree(table);
+out:
+	return -ENOBUFS;
+}
+
+static void mpls_dev_sysctl_unregister(struct mpls_dev *mdev)
+{
+	struct ctl_table *table;
+
+	table = mdev->sysctl->ctl_table_arg;
+	unregister_net_sysctl_table(mdev->sysctl);
+	kfree(table);
+}
+
+static struct mpls_dev *mpls_add_dev(struct net_device *dev)
+{
+	struct mpls_dev *mdev;
+	int err = -ENOMEM;
+
+	ASSERT_RTNL();
+
+	mdev = kzalloc(sizeof(*mdev), GFP_KERNEL);
+	if (!mdev)
+		return ERR_PTR(err);
+
+	/* Enable MPLS by default on loopback devices, since this
+	 * doesn't represent a security boundary and is required for the
+	 * lookup of inner labels for LSPs terminating on this router.
+	 */
+	if (dev->flags & IFF_LOOPBACK)
+		mdev->fwd_enabled = 1;
+
+	err = mpls_dev_sysctl_register(dev, mdev);
+	if (err)
+		goto free;
+
+	rcu_assign_pointer(dev->mpls_ptr, mdev);
+
+	return mdev;
+
+free:
+	kfree(mdev);
+	return ERR_PTR(err);
+}
+
 static void mpls_ifdown(struct net_device *dev)
 {
 	struct mpls_route __rcu **platform_label;
 	struct net *net = dev_net(dev);
+	struct mpls_dev *mdev;
 	unsigned index;
 
 	platform_label = rtnl_dereference(net->mpls.platform_label);
@@ -455,14 +551,31 @@ static void mpls_ifdown(struct net_device *dev)
 			continue;
 		rt->rt_dev = NULL;
 	}
+
+	mdev = mpls_dev_get(dev);
+	if (!mdev)
+		return;
+
+	mpls_dev_sysctl_unregister(mdev);
+
+	RCU_INIT_POINTER(dev->mpls_ptr, NULL);
+
+	kfree(mdev);
 }
 
 static int mpls_dev_notify(struct notifier_block *this, unsigned long event,
 			   void *ptr)
 {
 	struct net_device *dev = netdev_notifier_info_to_dev(ptr);
+	struct mpls_dev *mdev;
 
 	switch(event) {
+	case NETDEV_REGISTER:
+		mdev = mpls_add_dev(dev);
+		if (IS_ERR(mdev))
+			return notifier_from_errno(PTR_ERR(mdev));
+		break;
+
 	case NETDEV_UNREGISTER:
 		mpls_ifdown(dev);
 		break;
@@ -924,7 +1037,7 @@ static int mpls_platform_labels(struct ctl_table *table, int write,
 	return ret;
 }
 
-static struct ctl_table mpls_table[] = {
+static const struct ctl_table mpls_table[] = {
 	{
 		.procname	= "platform_labels",
 		.data		= NULL,
diff --git a/net/mpls/internal.h b/net/mpls/internal.h
index 5732283..e676a43 100644
--- a/net/mpls/internal.h
+++ b/net/mpls/internal.h
@@ -23,6 +23,12 @@ struct mpls_entry_decoded {
 	u8 bos;
 };
 
+struct mpls_dev {
+	int			fwd_enabled;
+
+	struct ctl_table_header *sysctl;
+};
+
 struct sk_buff;
 
 static inline struct mpls_shim_hdr *mpls_hdr(const struct sk_buff *skb)
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH net-next v2 5/5] mpls: Allow payload type to be associated with label routes
  2015-03-20 15:42 ` [PATCH net-next v2 0/5] mpls: Behaviour-changing improvements Robert Shearman
                     ` (3 preceding siblings ...)
  2015-03-20 15:42   ` [PATCH net-next v2 4/5] mpls: Per-device enabling of packet forwarding Robert Shearman
@ 2015-03-20 15:42   ` Robert Shearman
  2015-03-22 20:56     ` Eric W. Biederman
  2015-03-30 18:15   ` [PATCH net-next v3 0/4] mpls: Behaviour-changing improvements Robert Shearman
  5 siblings, 1 reply; 68+ messages in thread
From: Robert Shearman @ 2015-03-20 15:42 UTC (permalink / raw)
  To: davem; +Cc: netdev, Robert Shearman, Eric W. Biederman

RFC 4182 s2 states that if an IPv4 Explicit NULL label is the only
label on the stack, then after popping the resulting packet must be
treated as a IPv4 packet and forwarded based on the IPv4 header. The
same is true for IPv6 Explicit NULL with an IPv6 packet following.

Therefore, when installing the IPv4/IPv6 Explicit NULL label routes,
add an attribute that specifies the expected payload type for use at
forwarding time for determining the type of the encapsulated packet
instead of inspecting the first nibble of the packet.

Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Robert Shearman <rshearma@brocade.com>
---
 net/mpls/af_mpls.c | 87 ++++++++++++++++++++++++++++++++++--------------------
 1 file changed, 55 insertions(+), 32 deletions(-)

diff --git a/net/mpls/af_mpls.c b/net/mpls/af_mpls.c
index 14c7e76..653bae1 100644
--- a/net/mpls/af_mpls.c
+++ b/net/mpls/af_mpls.c
@@ -23,13 +23,20 @@
 /* This maximum ha length copied from the definition of struct neighbour */
 #define MAX_VIA_ALEN (ALIGN(MAX_ADDR_LEN, sizeof(unsigned long)))
 
+enum mpls_payload_type {
+	MPT_UNSPEC, /* IPv4 or IPv6 */
+	MPT_IPV4,
+	MPT_IPV6,
+};
+
 struct mpls_route { /* next hop label forwarding entry */
 	struct net_device __rcu *rt_dev;
 	struct rcu_head		rt_rcu;
 	u32			rt_label[MAX_NEW_LABELS];
 	u8			rt_protocol; /* routing protocol that set this entry */
 	u8                      rt_unlabeled : 1;
-	u8			rt_labels : 7;
+	u8                      rt_payload_type : 3;
+	u8			rt_labels : 4;
 	u8			rt_via_alen;
 	u8			rt_via_table;
 	u8			rt_via[0];
@@ -87,19 +94,24 @@ static bool mpls_pkt_too_big(const struct sk_buff *skb, unsigned int mtu)
 	return true;
 }
 
-static bool mpls_egress(struct mpls_route *rt, struct sk_buff *skb,
-			struct mpls_entry_decoded dec)
+static enum mpls_payload_type mpls_pkt_determine_af(struct sk_buff *skb)
 {
-	/* RFC4385 and RFC5586 encode other packets in mpls such that
-	 * they don't conflict with the ip version number, making
-	 * decoding by examining the ip version correct in everything
-	 * except for the strangest cases.
-	 *
-	 * The strange cases if we choose to support them will require
-	 * manual configuration.
-	 */
-	struct iphdr *hdr4;
-	bool success = true;
+	struct iphdr *hdr4 = ip_hdr(skb);
+
+	switch (hdr4->version) {
+	case 4:
+		return MPT_IPV4;
+	case 6:
+		return MPT_IPV6;
+	}
+
+	return MPT_UNSPEC;
+}
+
+static bool mpls_bos_egress(struct mpls_route *rt, struct sk_buff *skb,
+			    struct mpls_entry_decoded dec)
+{
+	enum mpls_payload_type payload_type;
 
 	/* The IPv4 code below accesses through the IPv4 header
 	 * checksum, which is 12 bytes into the packet.
@@ -114,24 +126,31 @@ static bool mpls_egress(struct mpls_route *rt, struct sk_buff *skb,
 	if (!pskb_may_pull(skb, 12))
 		return false;
 
-	/* Use ip_hdr to find the ip protocol version */
-	hdr4 = ip_hdr(skb);
-	if (hdr4->version == 4) {
+	payload_type = rt->rt_payload_type;
+	if (payload_type == MPT_UNSPEC)
+		payload_type = mpls_pkt_determine_af(skb);
+
+	switch (payload_type) {
+	case MPT_IPV4: {
+		struct iphdr *hdr4 = ip_hdr(skb);
 		skb->protocol = htons(ETH_P_IP);
 		csum_replace2(&hdr4->check,
 			      htons(hdr4->ttl << 8),
 			      htons(dec.ttl << 8));
 		hdr4->ttl = dec.ttl;
+		return true;
 	}
-	else if (hdr4->version == 6) {
+	case MPT_IPV6: {
 		struct ipv6hdr *hdr6 = ipv6_hdr(skb);
 		skb->protocol = htons(ETH_P_IPV6);
 		hdr6->hop_limit = dec.ttl;
+		return true;
 	}
-	else
-		/* version 0 and version 1 are used by pseudo wires */
-		success = false;
-	return success;
+	case MPT_UNSPEC:
+		break;
+	}
+
+	return false;
 }
 
 static int mpls_forward(struct sk_buff *skb, struct net_device *dev,
@@ -210,7 +229,7 @@ static int mpls_forward(struct sk_buff *skb, struct net_device *dev,
 	skb->protocol = htons(ETH_P_MPLS_UC);
 
 	if (unlikely(!new_header_size && dec.bos)) {
-		if (!mpls_egress(rt, skb, dec))
+		if (!mpls_bos_egress(rt, skb, dec))
 			goto drop;
 	} else if (rt->rt_unlabeled) {
 		/* Labeled traffic destined to unlabeled peer should
@@ -253,16 +272,17 @@ static const struct nla_policy rtm_mpls_policy[RTA_MAX+1] = {
 };
 
 struct mpls_route_config {
-	u32		rc_protocol;
-	u32		rc_ifindex;
-	u16		rc_via_table;
-	u16		rc_via_alen;
-	u8		rc_via[MAX_VIA_ALEN];
-	u32		rc_label;
-	u32		rc_output_labels;
-	u32		rc_output_label[MAX_NEW_LABELS];
-	u32		rc_nlflags;
-	struct nl_info	rc_nlinfo;
+	u32			rc_protocol;
+	u32			rc_ifindex;
+	u16			rc_via_table;
+	u16			rc_via_alen;
+	u8			rc_via[MAX_VIA_ALEN];
+	u32			rc_label;
+	u32			rc_output_labels;
+	u32			rc_output_label[MAX_NEW_LABELS];
+	u32			rc_nlflags;
+	enum mpls_payload_type	rc_payload_type;
+	struct nl_info		rc_nlinfo;
 };
 
 static struct mpls_route *mpls_rt_alloc(size_t alen)
@@ -413,6 +433,7 @@ static int mpls_route_add(struct mpls_route_config *cfg)
 	}
 	rt->rt_protocol = cfg->rc_protocol;
 	RCU_INIT_POINTER(rt->rt_dev, dev);
+	rt->rt_payload_type = cfg->rc_payload_type;
 	rt->rt_via_table = cfg->rc_via_table;
 	memcpy(rt->rt_via, cfg->rc_via, cfg->rc_via_alen);
 
@@ -948,6 +969,7 @@ static int resize_platform_label_table(struct net *net, size_t limit)
 			goto nort0;
 		RCU_INIT_POINTER(rt0->rt_dev, lo);
 		rt0->rt_protocol = RTPROT_KERNEL;
+		rt0->rt_payload_type = MPT_IPV4;
 		rt0->rt_via_table = NEIGH_LINK_TABLE;
 		memcpy(rt0->rt_via, lo->dev_addr, lo->addr_len);
 	}
@@ -958,6 +980,7 @@ static int resize_platform_label_table(struct net *net, size_t limit)
 			goto nort2;
 		RCU_INIT_POINTER(rt2->rt_dev, lo);
 		rt2->rt_protocol = RTPROT_KERNEL;
+		rt2->rt_payload_type = MPT_IPV6;
 		rt2->rt_via_table = NEIGH_LINK_TABLE;
 		memcpy(rt2->rt_via, lo->dev_addr, lo->addr_len);
 	}
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 68+ messages in thread

* Re: [PATCH net-next v2 1/5] mpls: Use definition for reserved label checks
  2015-03-20 15:42   ` [PATCH net-next v2 1/5] mpls: Use definition for reserved label checks Robert Shearman
@ 2015-03-22 19:09     ` Eric W. Biederman
  0 siblings, 0 replies; 68+ messages in thread
From: Eric W. Biederman @ 2015-03-22 19:09 UTC (permalink / raw)
  To: Robert Shearman; +Cc: davem, netdev

Robert Shearman <rshearma@brocade.com> writes:

> In multiple locations there are checks for whether the label in hand
> is a reserved label or not using the arbritray value of 16. Factor
> this out into a #define for better maintainability and for
> documentation.

Acked-by: "Eric W. Biederman" <ebiederm@xmission.com>

> Cc: "Eric W. Biederman" <ebiederm@xmission.com>
> Signed-off-by: Robert Shearman <rshearma@brocade.com>
> ---
>  net/mpls/af_mpls.c  | 20 ++++++++++----------
>  net/mpls/internal.h |  1 +
>  2 files changed, 11 insertions(+), 10 deletions(-)
>
> diff --git a/net/mpls/af_mpls.c b/net/mpls/af_mpls.c
> index db8a2ea..0d6763a 100644
> --- a/net/mpls/af_mpls.c
> +++ b/net/mpls/af_mpls.c
> @@ -276,7 +276,7 @@ static void mpls_notify_route(struct net *net, unsigned index,
>  	struct mpls_route *rt = new ? new : old;
>  	unsigned nlm_flags = (old && new) ? NLM_F_REPLACE : 0;
>  	/* Ignore reserved labels for now */
> -	if (rt && (index >= 16))
> +	if (rt && (index >= LABEL_FIRST_UNRESERVED))
>  		rtmsg_lfib(event, index, rt, nlh, net, portid, nlm_flags);
>  }
>  
> @@ -310,7 +310,7 @@ static unsigned find_free_label(struct net *net)
>  
>  	platform_label = rtnl_dereference(net->mpls.platform_label);
>  	platform_labels = net->mpls.platform_labels;
> -	for (index = 16; index < platform_labels; index++) {
> +	for (index = LABEL_FIRST_UNRESERVED; index < platform_labels; index++) {
>  		if (!rtnl_dereference(platform_label[index]))
>  			return index;
>  	}
> @@ -335,8 +335,8 @@ static int mpls_route_add(struct mpls_route_config *cfg)
>  		index = find_free_label(net);
>  	}
>  
> -	/* The first 16 labels are reserved, and may not be set */
> -	if (index < 16)
> +	/* Reserved labels may not be set */
> +	if (index < LABEL_FIRST_UNRESERVED)
>  		goto errout;
>  
>  	/* The full 20 bit range may not be supported. */
> @@ -413,8 +413,8 @@ static int mpls_route_del(struct mpls_route_config *cfg)
>  
>  	index = cfg->rc_label;
>  
> -	/* The first 16 labels are reserved, and may not be removed */
> -	if (index < 16)
> +	/* Reserved labels may not be removed */
> +	if (index < LABEL_FIRST_UNRESERVED)
>  		goto errout;
>  
>  	/* The full 20 bit range may not be supported */
> @@ -610,8 +610,8 @@ static int rtm_to_route_config(struct sk_buff *skb,  struct nlmsghdr *nlh,
>  					   &cfg->rc_label))
>  				goto errout;
>  
> -			/* The first 16 labels are reserved, and may not be set */
> -			if (cfg->rc_label < 16)
> +			/* Reserved labels may not be set */
> +			if (cfg->rc_label < LABEL_FIRST_UNRESERVED)
>  				goto errout;
>  
>  			break;
> @@ -736,8 +736,8 @@ static int mpls_dump_routes(struct sk_buff *skb, struct netlink_callback *cb)
>  	ASSERT_RTNL();
>  
>  	index = cb->args[0];
> -	if (index < 16)
> -		index = 16;
> +	if (index < LABEL_FIRST_UNRESERVED)
> +		index = LABEL_FIRST_UNRESERVED;
>  
>  	platform_label = rtnl_dereference(net->mpls.platform_label);
>  	platform_labels = net->mpls.platform_labels;
> diff --git a/net/mpls/internal.h b/net/mpls/internal.h
> index fb6de92..5732283 100644
> --- a/net/mpls/internal.h
> +++ b/net/mpls/internal.h
> @@ -9,6 +9,7 @@
>  #define LABEL_GAL			13 /* RFC5586 */
>  #define LABEL_OAM_ALERT			14 /* RFC3429 */
>  #define LABEL_EXTENSION			15 /* RFC7274 */
> +#define LABEL_FIRST_UNRESERVED		16 /* RFC3032 */
>  
>  
>  struct mpls_shim_hdr {

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH net-next v2 2/5] mpls: Remove incorrect PHP comment
  2015-03-20 15:42   ` [PATCH net-next v2 2/5] mpls: Remove incorrect PHP comment Robert Shearman
@ 2015-03-22 19:12     ` Eric W. Biederman
  2015-03-23 11:32       ` Robert Shearman
  0 siblings, 1 reply; 68+ messages in thread
From: Eric W. Biederman @ 2015-03-22 19:12 UTC (permalink / raw)
  To: Robert Shearman; +Cc: davem, netdev

Robert Shearman <rshearma@brocade.com> writes:

> Popping the last label on the stack does not necessarily imply
> performing penultimate hop popping. There is no reason why this
> couldn't be the last hop in the network, so remove the comment.

So this change I will disagree with.

What the code implements is Penultimate hop popping.  Even if you send
the packets over loopback that is what the code is doing.

This is relevant because I think the code may actually be wrong in the
local reception case.  By preforming penultimate hop popping and
receving the code on loopback I think this code allows bypassing
iptables rules that apply to incoming ip packets.  Certainly there is a
loss of information as to which hardware interface the packet came in on
that it may be desirable to correct.

Eric


> Cc: "Eric W. Biederman" <ebiederm@xmission.com>
> Signed-off-by: Robert Shearman <rshearma@brocade.com>
> ---
>  net/mpls/af_mpls.c | 1 -
>  1 file changed, 1 deletion(-)
>
> diff --git a/net/mpls/af_mpls.c b/net/mpls/af_mpls.c
> index 0d6763a..bf3459a 100644
> --- a/net/mpls/af_mpls.c
> +++ b/net/mpls/af_mpls.c
> @@ -199,7 +199,6 @@ static int mpls_forward(struct sk_buff *skb, struct net_device *dev,
>  	skb->protocol = htons(ETH_P_MPLS_UC);
>  
>  	if (unlikely(!new_header_size && dec.bos)) {
> -		/* Penultimate hop popping */
>  		if (!mpls_egress(rt, skb, dec))
>  			goto drop;
>  	} else {

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH net-next v2 3/5] mpls: Differentiate implicit-null and unlabeled neighbours
  2015-03-20 15:42   ` [PATCH net-next v2 3/5] mpls: Differentiate implicit-null and unlabeled neighbours Robert Shearman
@ 2015-03-22 19:49     ` Eric W. Biederman
  2015-03-22 21:06       ` Eric W. Biederman
  0 siblings, 1 reply; 68+ messages in thread
From: Eric W. Biederman @ 2015-03-22 19:49 UTC (permalink / raw)
  To: Robert Shearman; +Cc: davem, netdev

Robert Shearman <rshearma@brocade.com> writes:

> The control plane can advertise labels for neighbours that don't have
> an outgoing label. RFC 3032 s3.22 states that either the remaining
> labels should be popped (if the control plane can determine that it's
> safe to do so, which in light of MPLS-VPN, RFC 4364, is never the case
> now) or that the packet should be discarded.

I can not figure out what you are referring to.  There is no section 3.2
in RFC3022.

> Therefore, if the peer is unlabeled and the last label wasn't popped
> then drop the packet. The peer being unlabeled is signalled by an
> empty label stack. However, implicit-null still needs to be supported
> (i.e. penultimate hop popping) where the incoming label is popped and
> no labels are put on and the packet can still go out labeled with the
> unpopped part of the stack. This is achieved by the control plane
> specifying a label stack consisting of the single special
> implicit-null value.

As I understand it you want to handle the case for a label for which
there is no next hop, and the packet should be black-holed.

In struct mpls_route such routes are currently represented by routes
that have no network device.  And in rtnetlink should be represented
with routes of type RTN_BLACKHOLE which I do not currently support
parsing.  But that should be simple enough to correc.t

With respect to Implicit NULL it should be an error to accept a route
that has an RTA_NEWDST that includes an implicit NULL.

The rtnetlink is not ldp nor should it have ldp semantics and be made
complicated by those semantics.

The semantics of RTA_NEWDST are the labels to push on after the top most
label has been popped off.  I see no reason to include other mechanisms
into that processing when it is easy enough to add or tweak other
attributes to have those semantics.

Certainly it is not something that I think is worth special casing on
the fast path in mpls_forward.

> Cc: "Eric W. Biederman" <ebiederm@xmission.com>
> Signed-off-by: Robert Shearman <rshearma@brocade.com>
> ---
>  net/mpls/af_mpls.c | 21 +++++++++++++++++----
>  1 file changed, 17 insertions(+), 4 deletions(-)
>
> diff --git a/net/mpls/af_mpls.c b/net/mpls/af_mpls.c
> index bf3459a..e3586a7 100644
> --- a/net/mpls/af_mpls.c
> +++ b/net/mpls/af_mpls.c
> @@ -28,7 +28,8 @@ struct mpls_route { /* next hop label forwarding entry */
>  	struct rcu_head		rt_rcu;
>  	u32			rt_label[MAX_NEW_LABELS];
>  	u8			rt_protocol; /* routing protocol that set this entry */
> -	u8			rt_labels;
> +	u8                      rt_unlabeled : 1;
> +	u8			rt_labels : 7;
>  	u8			rt_via_alen;
>  	u8			rt_via_table;
>  	u8			rt_via[0];
> @@ -201,6 +202,11 @@ static int mpls_forward(struct sk_buff *skb, struct net_device *dev,
>  	if (unlikely(!new_header_size && dec.bos)) {
>  		if (!mpls_egress(rt, skb, dec))
>  			goto drop;
> +	} else if (rt->rt_unlabeled) {
> +		/* Labeled traffic destined to unlabeled peer should
> +		 * be discarded
> +		 */
> +		goto drop;
>  	} else {
>  		bool bos;
>  		int i;
> @@ -385,9 +391,16 @@ static int mpls_route_add(struct mpls_route_config *cfg)
>  	if (!rt)
>  		goto errout;
>  
> -	rt->rt_labels = cfg->rc_output_labels;
> -	for (i = 0; i < rt->rt_labels; i++)
> -		rt->rt_label[i] = cfg->rc_output_label[i];
> +	if (cfg->rc_output_labels == 1 &&
> +	    cfg->rc_output_label[0] == LABEL_IMPLICIT_NULL) {
> +		rt->rt_labels = 0;
> +	} else {
> +		rt->rt_labels = cfg->rc_output_labels;
> +		for (i = 0; i < rt->rt_labels; i++)
> +			rt->rt_label[i] = cfg->rc_output_label[i];
> +		if (!rt->rt_labels)
> +			rt->rt_unlabeled = true;
> +	}
>  	rt->rt_protocol = cfg->rc_protocol;
>  	RCU_INIT_POINTER(rt->rt_dev, dev);
>  	rt->rt_via_table = cfg->rc_via_table;

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH net-next v2 4/5] mpls: Per-device enabling of packet forwarding
  2015-03-20 15:42   ` [PATCH net-next v2 4/5] mpls: Per-device enabling of packet forwarding Robert Shearman
@ 2015-03-22 20:02     ` Eric W. Biederman
  2015-03-22 20:34       ` Eric W. Biederman
  2015-03-23 13:10       ` Robert Shearman
  0 siblings, 2 replies; 68+ messages in thread
From: Eric W. Biederman @ 2015-03-22 20:02 UTC (permalink / raw)
  To: Robert Shearman; +Cc: davem, netdev

Robert Shearman <rshearma@brocade.com> writes:

> An MPLS network is a single trust domain where the edges must be in
> control of what labels make their way into the core. The simplest way
> of ensuring for the edge device to always impose the labels, and not
> allow forward labeled traffic from untrusted neighbours. This is
> achieved by allowing a per-device configuration of whether MPLS
> traffic received over that interface should be forwarded or not.
>
> To be secure by default, MPLS is now intially disabled on all
> interfaces (except the loopback) until explicitly enabled and no
> global option is provided to change the default. Whilst this differs
> from other protocols (e.g. IPv6), network operators are used to
> explicitly enabling MPLS forwarding on interfaces, and with the number
> of links to the MPLS core typically fairly low this doesn't present
> too much of a burden on operators.

Overall this patch looks like the correct direction to go.

And a default disable is the right way to go for new features, that way
even if the code is compiled in people don't get surprised by new
behavior when they upgrade kernels.

It would be very nice if the check for ARPHRD types was moved from
mpls_route_add to mpls_add_dev.  Which would save memory and complexity
when mpls is not supported on a network device type.

Eric

> Cc: "Eric W. Biederman" <ebiederm@xmission.com>
> Signed-off-by: Robert Shearman <rshearma@brocade.com>
> ---
>  Documentation/networking/mpls-sysctl.txt |   9 +++
>  include/linux/netdevice.h                |   4 ++
>  net/mpls/af_mpls.c                       | 115 ++++++++++++++++++++++++++++++-
>  net/mpls/internal.h                      |   6 ++
>  4 files changed, 133 insertions(+), 1 deletion(-)
>
> diff --git a/Documentation/networking/mpls-sysctl.txt b/Documentation/networking/mpls-sysctl.txt
> index 639ddf0..f48772c 100644
> --- a/Documentation/networking/mpls-sysctl.txt
> +++ b/Documentation/networking/mpls-sysctl.txt
> @@ -18,3 +18,12 @@ platform_labels - INTEGER
>  
>  	Possible values: 0 - 1048575
>  	Default: 0
> +
> +conf/<interface>/forwarding - BOOL
> +	Forward packets received on this interface.
> +
> +	If disabled, packets will be discarded without further
> +	processing.
> +
> +	0 - disabled (default)
> +	not 0 - enabled
> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
> index 76951c5..ee4ca06 100644
> --- a/include/linux/netdevice.h
> +++ b/include/linux/netdevice.h
> @@ -60,6 +60,7 @@ struct phy_device;
>  struct wireless_dev;
>  /* 802.15.4 specific */
>  struct wpan_dev;
> +struct mpls_dev;
>  
>  void netdev_set_default_ethtool_ops(struct net_device *dev,
>  				    const struct ethtool_ops *ops);
> @@ -1615,6 +1616,9 @@ struct net_device {
>  	void			*ax25_ptr;
>  	struct wireless_dev	*ieee80211_ptr;
>  	struct wpan_dev		*ieee802154_ptr;
> +#if IS_ENABLED(CONFIG_MPLS_ROUTING)
> +	struct mpls_dev __rcu	*mpls_ptr;
> +#endif
>  
>  /*
>   * Cache lines mostly used on receive path (including eth_type_trans())
> diff --git a/net/mpls/af_mpls.c b/net/mpls/af_mpls.c
> index e3586a7..14c7e76 100644
> --- a/net/mpls/af_mpls.c
> +++ b/net/mpls/af_mpls.c
> @@ -54,6 +54,11 @@ static struct mpls_route *mpls_route_input_rcu(struct net *net, unsigned index)
>  	return rt;
>  }
>  
> +static inline struct mpls_dev *mpls_dev_get(const struct net_device *dev)
> +{
> +	return rcu_dereference_rtnl(dev->mpls_ptr);
> +}
> +
>  static bool mpls_output_possible(const struct net_device *dev)
>  {
>  	return dev && (dev->flags & IFF_UP) && netif_carrier_ok(dev);
> @@ -137,6 +142,7 @@ static int mpls_forward(struct sk_buff *skb, struct net_device *dev,
>  	struct mpls_route *rt;
>  	struct mpls_entry_decoded dec;
>  	struct net_device *out_dev;
> +	struct mpls_dev *mdev;
>  	unsigned int hh_len;
>  	unsigned int new_header_size;
>  	unsigned int mtu;
> @@ -144,6 +150,10 @@ static int mpls_forward(struct sk_buff *skb, struct net_device *dev,
>  
>  	/* Careful this entire function runs inside of an rcu critical section */
>  
> +	mdev = mpls_dev_get(dev);
> +	if (!mdev || !mdev->fwd_enabled)
> +		goto drop;
> +
>  	if (skb->pkt_type != PACKET_HOST)
>  		goto drop;
>  
> @@ -440,10 +450,96 @@ errout:
>  	return err;
>  }
>  
> +#define MPLS_PERDEV_SYSCTL_OFFSET(field)	\
> +	(&((struct mpls_dev *)0)->field)
> +
> +static const struct ctl_table mpls_dev_table[] = {
> +	{
> +		.procname	= "forwarding",
> +		.maxlen		= sizeof(int),
> +		.mode		= 0644,
> +		.proc_handler	= proc_dointvec,
> +		.data		= MPLS_PERDEV_SYSCTL_OFFSET(fwd_enabled),
> +	},
> +	{ }
> +};
> +
> +static int mpls_dev_sysctl_register(struct net_device *dev,
> +				    struct mpls_dev *mdev)
> +{
> +	char path[sizeof("net/mpls/conf/") + IFNAMSIZ];
> +	struct ctl_table *table;
> +	int i;
> +
> +	table = kmemdup(&mpls_dev_table, sizeof(mpls_dev_table), GFP_KERNEL);
> +	if (!table)
> +		goto out;
> +
> +	/* Table data contains only offsets relative to the base of
> +	 * the mdev at this point, so make them absolute.
> +	 */
> +	for (i = 0; i < ARRAY_SIZE(mpls_dev_table); i++)
> +		table[i].data = (char *)mdev + (uintptr_t)table[i].data;
> +
> +	snprintf(path, sizeof(path), "net/mpls/conf/%s", dev->name);
> +
> +	mdev->sysctl = register_net_sysctl(dev_net(dev), path, table);
> +	if (!mdev->sysctl)
> +		goto free;
> +
> +	return 0;
> +
> +free:
> +	kfree(table);
> +out:
> +	return -ENOBUFS;
> +}
> +
> +static void mpls_dev_sysctl_unregister(struct mpls_dev *mdev)
> +{
> +	struct ctl_table *table;
> +
> +	table = mdev->sysctl->ctl_table_arg;
> +	unregister_net_sysctl_table(mdev->sysctl);
> +	kfree(table);
> +}
> +
> +static struct mpls_dev *mpls_add_dev(struct net_device *dev)
> +{
> +	struct mpls_dev *mdev;
> +	int err = -ENOMEM;
> +
> +	ASSERT_RTNL();
> +
> +	mdev = kzalloc(sizeof(*mdev), GFP_KERNEL);
> +	if (!mdev)
> +		return ERR_PTR(err);
> +
> +	/* Enable MPLS by default on loopback devices, since this
> +	 * doesn't represent a security boundary and is required for the
> +	 * lookup of inner labels for LSPs terminating on this router.
> +	 */
> +	if (dev->flags & IFF_LOOPBACK)
> +		mdev->fwd_enabled = 1;
> +
> +	err = mpls_dev_sysctl_register(dev, mdev);
> +	if (err)
> +		goto free;
> +
> +	rcu_assign_pointer(dev->mpls_ptr, mdev);
> +
> +	return mdev;
> +
> +free:
> +	kfree(mdev);
> +	return ERR_PTR(err);
> +}
> +
>  static void mpls_ifdown(struct net_device *dev)
>  {
>  	struct mpls_route __rcu **platform_label;
>  	struct net *net = dev_net(dev);
> +	struct mpls_dev *mdev;
>  	unsigned index;
>  
>  	platform_label = rtnl_dereference(net->mpls.platform_label);
> @@ -455,14 +551,31 @@ static void mpls_ifdown(struct net_device *dev)
>  			continue;
>  		rt->rt_dev = NULL;
>  	}
> +
> +	mdev = mpls_dev_get(dev);
> +	if (!mdev)
> +		return;
> +
> +	mpls_dev_sysctl_unregister(mdev);
> +
> +	RCU_INIT_POINTER(dev->mpls_ptr, NULL);
> +
> +	kfree(mdev);
>  }
>  
>  static int mpls_dev_notify(struct notifier_block *this, unsigned long event,
>  			   void *ptr)
>  {
>  	struct net_device *dev = netdev_notifier_info_to_dev(ptr);
> +	struct mpls_dev *mdev;
>  
>  	switch(event) {
> +	case NETDEV_REGISTER:
> +		mdev = mpls_add_dev(dev);
> +		if (IS_ERR(mdev))
> +			return notifier_from_errno(PTR_ERR(mdev));
> +		break;
> +
>  	case NETDEV_UNREGISTER:
>  		mpls_ifdown(dev);
>  		break;
> @@ -924,7 +1037,7 @@ static int mpls_platform_labels(struct ctl_table *table, int write,
>  	return ret;
>  }
>  
> -static struct ctl_table mpls_table[] = {
> +static const struct ctl_table mpls_table[] = {
>  	{
>  		.procname	= "platform_labels",
>  		.data		= NULL,
> diff --git a/net/mpls/internal.h b/net/mpls/internal.h
> index 5732283..e676a43 100644
> --- a/net/mpls/internal.h
> +++ b/net/mpls/internal.h
> @@ -23,6 +23,12 @@ struct mpls_entry_decoded {
>  	u8 bos;
>  };
>  
> +struct mpls_dev {
> +	int			fwd_enabled;
> +
> +	struct ctl_table_header *sysctl;
> +};
> +
>  struct sk_buff;
>  
>  static inline struct mpls_shim_hdr *mpls_hdr(const struct sk_buff *skb)

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH net-next v2 4/5] mpls: Per-device enabling of packet forwarding
  2015-03-22 20:02     ` Eric W. Biederman
@ 2015-03-22 20:34       ` Eric W. Biederman
  2015-03-23 13:42         ` Robert Shearman
  2015-03-23 13:10       ` Robert Shearman
  1 sibling, 1 reply; 68+ messages in thread
From: Eric W. Biederman @ 2015-03-22 20:34 UTC (permalink / raw)
  To: Robert Shearman; +Cc: davem, netdev

ebiederm@xmission.com (Eric W. Biederman) writes:

> Robert Shearman <rshearma@brocade.com> writes:
>
>> An MPLS network is a single trust domain where the edges must be in
>> control of what labels make their way into the core. The simplest way
>> of ensuring for the edge device to always impose the labels, and not
>> allow forward labeled traffic from untrusted neighbours. This is
>> achieved by allowing a per-device configuration of whether MPLS
>> traffic received over that interface should be forwarded or not.
>>
>> To be secure by default, MPLS is now intially disabled on all
>> interfaces (except the loopback) until explicitly enabled and no
>> global option is provided to change the default. Whilst this differs
>> from other protocols (e.g. IPv6), network operators are used to
>> explicitly enabling MPLS forwarding on interfaces, and with the number
>> of links to the MPLS core typically fairly low this doesn't present
>> too much of a burden on operators.
>
> Overall this patch looks like the correct direction to go.
>
> And a default disable is the right way to go for new features, that way
> even if the code is compiled in people don't get surprised by new
> behavior when they upgrade kernels.
>
> It would be very nice if the check for ARPHRD types was moved from
> mpls_route_add to mpls_add_dev.  Which would save memory and complexity
> when mpls is not supported on a network device type.

There is also a question of do we want "forwarding" to be the parameter
we are controlling.  The other option is not "forwarding" but mpls
"enable".

Completely disabling mpls on a device might be too strong as it would
presumably work for output as well as input.

Forwarding at least for ipv4 and ipv6 has the semantic that you can
still accept packets that are routed to yourself, which your
implementation of forwarding does not.

So I expect what we actually want here is either "enable" or two
knobs "input" and "output".

Eric


>> Cc: "Eric W. Biederman" <ebiederm@xmission.com>
>> Signed-off-by: Robert Shearman <rshearma@brocade.com>
>> ---
>>  Documentation/networking/mpls-sysctl.txt |   9 +++
>>  include/linux/netdevice.h                |   4 ++
>>  net/mpls/af_mpls.c                       | 115 ++++++++++++++++++++++++++++++-
>>  net/mpls/internal.h                      |   6 ++
>>  4 files changed, 133 insertions(+), 1 deletion(-)
>>
>> diff --git a/Documentation/networking/mpls-sysctl.txt b/Documentation/networking/mpls-sysctl.txt
>> index 639ddf0..f48772c 100644
>> --- a/Documentation/networking/mpls-sysctl.txt
>> +++ b/Documentation/networking/mpls-sysctl.txt
>> @@ -18,3 +18,12 @@ platform_labels - INTEGER
>>  
>>  	Possible values: 0 - 1048575
>>  	Default: 0
>> +
>> +conf/<interface>/forwarding - BOOL
>> +	Forward packets received on this interface.
>> +
>> +	If disabled, packets will be discarded without further
>> +	processing.
>> +
>> +	0 - disabled (default)
>> +	not 0 - enabled
>> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
>> index 76951c5..ee4ca06 100644
>> --- a/include/linux/netdevice.h
>> +++ b/include/linux/netdevice.h
>> @@ -60,6 +60,7 @@ struct phy_device;
>>  struct wireless_dev;
>>  /* 802.15.4 specific */
>>  struct wpan_dev;
>> +struct mpls_dev;
>>  
>>  void netdev_set_default_ethtool_ops(struct net_device *dev,
>>  				    const struct ethtool_ops *ops);
>> @@ -1615,6 +1616,9 @@ struct net_device {
>>  	void			*ax25_ptr;
>>  	struct wireless_dev	*ieee80211_ptr;
>>  	struct wpan_dev		*ieee802154_ptr;
>> +#if IS_ENABLED(CONFIG_MPLS_ROUTING)
>> +	struct mpls_dev __rcu	*mpls_ptr;
>> +#endif
>>  
>>  /*
>>   * Cache lines mostly used on receive path (including eth_type_trans())
>> diff --git a/net/mpls/af_mpls.c b/net/mpls/af_mpls.c
>> index e3586a7..14c7e76 100644
>> --- a/net/mpls/af_mpls.c
>> +++ b/net/mpls/af_mpls.c
>> @@ -54,6 +54,11 @@ static struct mpls_route *mpls_route_input_rcu(struct net *net, unsigned index)
>>  	return rt;
>>  }
>>  
>> +static inline struct mpls_dev *mpls_dev_get(const struct net_device *dev)
>> +{
>> +	return rcu_dereference_rtnl(dev->mpls_ptr);
>> +}
>> +
>>  static bool mpls_output_possible(const struct net_device *dev)
>>  {
>>  	return dev && (dev->flags & IFF_UP) && netif_carrier_ok(dev);
>> @@ -137,6 +142,7 @@ static int mpls_forward(struct sk_buff *skb, struct net_device *dev,
>>  	struct mpls_route *rt;
>>  	struct mpls_entry_decoded dec;
>>  	struct net_device *out_dev;
>> +	struct mpls_dev *mdev;
>>  	unsigned int hh_len;
>>  	unsigned int new_header_size;
>>  	unsigned int mtu;
>> @@ -144,6 +150,10 @@ static int mpls_forward(struct sk_buff *skb, struct net_device *dev,
>>  
>>  	/* Careful this entire function runs inside of an rcu critical section */
>>  
>> +	mdev = mpls_dev_get(dev);
>> +	if (!mdev || !mdev->fwd_enabled)
>> +		goto drop;
>> +
>>  	if (skb->pkt_type != PACKET_HOST)
>>  		goto drop;
>>  
>> @@ -440,10 +450,96 @@ errout:
>>  	return err;
>>  }
>>  
>> +#define MPLS_PERDEV_SYSCTL_OFFSET(field)	\
>> +	(&((struct mpls_dev *)0)->field)
>> +
>> +static const struct ctl_table mpls_dev_table[] = {
>> +	{
>> +		.procname	= "forwarding",
>> +		.maxlen		= sizeof(int),
>> +		.mode		= 0644,
>> +		.proc_handler	= proc_dointvec,
>> +		.data		= MPLS_PERDEV_SYSCTL_OFFSET(fwd_enabled),
>> +	},
>> +	{ }
>> +};
>> +
>> +static int mpls_dev_sysctl_register(struct net_device *dev,
>> +				    struct mpls_dev *mdev)
>> +{
>> +	char path[sizeof("net/mpls/conf/") + IFNAMSIZ];
>> +	struct ctl_table *table;
>> +	int i;
>> +
>> +	table = kmemdup(&mpls_dev_table, sizeof(mpls_dev_table), GFP_KERNEL);
>> +	if (!table)
>> +		goto out;
>> +
>> +	/* Table data contains only offsets relative to the base of
>> +	 * the mdev at this point, so make them absolute.
>> +	 */
>> +	for (i = 0; i < ARRAY_SIZE(mpls_dev_table); i++)
>> +		table[i].data = (char *)mdev + (uintptr_t)table[i].data;
>> +
>> +	snprintf(path, sizeof(path), "net/mpls/conf/%s", dev->name);
>> +
>> +	mdev->sysctl = register_net_sysctl(dev_net(dev), path, table);
>> +	if (!mdev->sysctl)
>> +		goto free;
>> +
>> +	return 0;
>> +
>> +free:
>> +	kfree(table);
>> +out:
>> +	return -ENOBUFS;
>> +}
>> +
>> +static void mpls_dev_sysctl_unregister(struct mpls_dev *mdev)
>> +{
>> +	struct ctl_table *table;
>> +
>> +	table = mdev->sysctl->ctl_table_arg;
>> +	unregister_net_sysctl_table(mdev->sysctl);
>> +	kfree(table);
>> +}
>> +
>> +static struct mpls_dev *mpls_add_dev(struct net_device *dev)
>> +{
>> +	struct mpls_dev *mdev;
>> +	int err = -ENOMEM;
>> +
>> +	ASSERT_RTNL();
>> +
>> +	mdev = kzalloc(sizeof(*mdev), GFP_KERNEL);
>> +	if (!mdev)
>> +		return ERR_PTR(err);
>> +
>> +	/* Enable MPLS by default on loopback devices, since this
>> +	 * doesn't represent a security boundary and is required for the
>> +	 * lookup of inner labels for LSPs terminating on this router.
>> +	 */
>> +	if (dev->flags & IFF_LOOPBACK)
>> +		mdev->fwd_enabled = 1;
>> +
>> +	err = mpls_dev_sysctl_register(dev, mdev);
>> +	if (err)
>> +		goto free;
>> +
>> +	rcu_assign_pointer(dev->mpls_ptr, mdev);
>> +
>> +	return mdev;
>> +
>> +free:
>> +	kfree(mdev);
>> +	return ERR_PTR(err);
>> +}
>> +
>>  static void mpls_ifdown(struct net_device *dev)
>>  {
>>  	struct mpls_route __rcu **platform_label;
>>  	struct net *net = dev_net(dev);
>> +	struct mpls_dev *mdev;
>>  	unsigned index;
>>  
>>  	platform_label = rtnl_dereference(net->mpls.platform_label);
>> @@ -455,14 +551,31 @@ static void mpls_ifdown(struct net_device *dev)
>>  			continue;
>>  		rt->rt_dev = NULL;
>>  	}
>> +
>> +	mdev = mpls_dev_get(dev);
>> +	if (!mdev)
>> +		return;
>> +
>> +	mpls_dev_sysctl_unregister(mdev);
>> +
>> +	RCU_INIT_POINTER(dev->mpls_ptr, NULL);
>> +
>> +	kfree(mdev);
>>  }
>>  
>>  static int mpls_dev_notify(struct notifier_block *this, unsigned long event,
>>  			   void *ptr)
>>  {
>>  	struct net_device *dev = netdev_notifier_info_to_dev(ptr);
>> +	struct mpls_dev *mdev;
>>  
>>  	switch(event) {
>> +	case NETDEV_REGISTER:
>> +		mdev = mpls_add_dev(dev);
>> +		if (IS_ERR(mdev))
>> +			return notifier_from_errno(PTR_ERR(mdev));
>> +		break;
>> +
>>  	case NETDEV_UNREGISTER:
>>  		mpls_ifdown(dev);
>>  		break;
>> @@ -924,7 +1037,7 @@ static int mpls_platform_labels(struct ctl_table *table, int write,
>>  	return ret;
>>  }
>>  
>> -static struct ctl_table mpls_table[] = {
>> +static const struct ctl_table mpls_table[] = {
>>  	{
>>  		.procname	= "platform_labels",
>>  		.data		= NULL,
>> diff --git a/net/mpls/internal.h b/net/mpls/internal.h
>> index 5732283..e676a43 100644
>> --- a/net/mpls/internal.h
>> +++ b/net/mpls/internal.h
>> @@ -23,6 +23,12 @@ struct mpls_entry_decoded {
>>  	u8 bos;
>>  };
>>  
>> +struct mpls_dev {
>> +	int			fwd_enabled;
>> +
>> +	struct ctl_table_header *sysctl;
>> +};
>> +
>>  struct sk_buff;
>>  
>>  static inline struct mpls_shim_hdr *mpls_hdr(const struct sk_buff *skb)

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH net-next v2 5/5] mpls: Allow payload type to be associated with label routes
  2015-03-20 15:42   ` [PATCH net-next v2 5/5] mpls: Allow payload type to be associated with label routes Robert Shearman
@ 2015-03-22 20:56     ` Eric W. Biederman
  2015-03-23 14:02       ` Robert Shearman
  0 siblings, 1 reply; 68+ messages in thread
From: Eric W. Biederman @ 2015-03-22 20:56 UTC (permalink / raw)
  To: Robert Shearman; +Cc: davem, netdev

Robert Shearman <rshearma@brocade.com> writes:

> RFC 4182 s2 states that if an IPv4 Explicit NULL label is the only
> label on the stack, then after popping the resulting packet must be
> treated as a IPv4 packet and forwarded based on the IPv4 header. The
> same is true for IPv6 Explicit NULL with an IPv6 packet following.
>
> Therefore, when installing the IPv4/IPv6 Explicit NULL label routes,
> add an attribute that specifies the expected payload type for use at
> forwarding time for determining the type of the encapsulated packet
> instead of inspecting the first nibble of the packet.

So this patch is not wrong.  And it at a practical level it is a good
idea to enforce ipv4 when the ipv4 explicit null label is present
and similarly with ipv6.

I do have some quibbles.

First I want to point out that in RFC3032 section 2.2 talks about using
a label in combination of with the packets contents to figure out the
type of packet that is being transmitted.  IPv4 and IPv6 do count as a
set of network layer protocols that can be distinguished by inspection
of the network layer header.

Changing mpls_egress to mpls_bos_egress bothers me a little, because it
seems redundant.  But I can see an argument for that name change.

I think it would be cleaner if we set MPT_IPV4 = 4 and MPT_IPV6 = 6.
which would remove a switch statement mpls_pkt_determine_af.

You delete my big fat comment referring people to how packets are
encoded in mpls.  That seems unfortunate, because it can be easy to get
lost in the MPLS rfcs, and I am certain someone will want to do more
than support IPv4 and IPv6.

Given the number of pseudo wire types I do believe that 3 bits is going
to be too small to encode everything going forward. 

> Cc: "Eric W. Biederman" <ebiederm@xmission.com>
> Signed-off-by: Robert Shearman <rshearma@brocade.com>
> ---
>  net/mpls/af_mpls.c | 87 ++++++++++++++++++++++++++++++++++--------------------
>  1 file changed, 55 insertions(+), 32 deletions(-)
>
> diff --git a/net/mpls/af_mpls.c b/net/mpls/af_mpls.c
> index 14c7e76..653bae1 100644
> --- a/net/mpls/af_mpls.c
> +++ b/net/mpls/af_mpls.c
> @@ -23,13 +23,20 @@
>  /* This maximum ha length copied from the definition of struct neighbour */
>  #define MAX_VIA_ALEN (ALIGN(MAX_ADDR_LEN, sizeof(unsigned long)))
>  
> +enum mpls_payload_type {
> +	MPT_UNSPEC, /* IPv4 or IPv6 */
> +	MPT_IPV4,
> +	MPT_IPV6,
> +};
> +
>  struct mpls_route { /* next hop label forwarding entry */
>  	struct net_device __rcu *rt_dev;
>  	struct rcu_head		rt_rcu;
>  	u32			rt_label[MAX_NEW_LABELS];
>  	u8			rt_protocol; /* routing protocol that set this entry */
>  	u8                      rt_unlabeled : 1;
> -	u8			rt_labels : 7;
> +	u8                      rt_payload_type : 3;
> +	u8			rt_labels : 4;
>  	u8			rt_via_alen;
>  	u8			rt_via_table;
>  	u8			rt_via[0];
> @@ -87,19 +94,24 @@ static bool mpls_pkt_too_big(const struct sk_buff *skb, unsigned int mtu)
>  	return true;
>  }
>  
> -static bool mpls_egress(struct mpls_route *rt, struct sk_buff *skb,
> -			struct mpls_entry_decoded dec)
> +static enum mpls_payload_type mpls_pkt_determine_af(struct sk_buff *skb)
>  {
> -	/* RFC4385 and RFC5586 encode other packets in mpls such that
> -	 * they don't conflict with the ip version number, making
> -	 * decoding by examining the ip version correct in everything
> -	 * except for the strangest cases.
> -	 *
> -	 * The strange cases if we choose to support them will require
> -	 * manual configuration.
> -	 */
> -	struct iphdr *hdr4;
> -	bool success = true;
> +	struct iphdr *hdr4 = ip_hdr(skb);
> +
> +	switch (hdr4->version) {
> +	case 4:
> +		return MPT_IPV4;
> +	case 6:
> +		return MPT_IPV6;
> +	}
> +
> +	return MPT_UNSPEC;
> +}
> +
> +static bool mpls_bos_egress(struct mpls_route *rt, struct sk_buff *skb,
> +			    struct mpls_entry_decoded dec)
> +{
> +	enum mpls_payload_type payload_type;
>  
>  	/* The IPv4 code below accesses through the IPv4 header
>  	 * checksum, which is 12 bytes into the packet.
> @@ -114,24 +126,31 @@ static bool mpls_egress(struct mpls_route *rt, struct sk_buff *skb,
>  	if (!pskb_may_pull(skb, 12))
>  		return false;
>  
> -	/* Use ip_hdr to find the ip protocol version */
> -	hdr4 = ip_hdr(skb);
> -	if (hdr4->version == 4) {
> +	payload_type = rt->rt_payload_type;
> +	if (payload_type == MPT_UNSPEC)
> +		payload_type = mpls_pkt_determine_af(skb);
> +
> +	switch (payload_type) {
> +	case MPT_IPV4: {
> +		struct iphdr *hdr4 = ip_hdr(skb);
>  		skb->protocol = htons(ETH_P_IP);
>  		csum_replace2(&hdr4->check,
>  			      htons(hdr4->ttl << 8),
>  			      htons(dec.ttl << 8));
>  		hdr4->ttl = dec.ttl;
> +		return true;
>  	}
> -	else if (hdr4->version == 6) {
> +	case MPT_IPV6: {
>  		struct ipv6hdr *hdr6 = ipv6_hdr(skb);
>  		skb->protocol = htons(ETH_P_IPV6);
>  		hdr6->hop_limit = dec.ttl;
> +		return true;
>  	}
> -	else
> -		/* version 0 and version 1 are used by pseudo wires */
> -		success = false;
> -	return success;
> +	case MPT_UNSPEC:
> +		break;
> +	}
> +
> +	return false;
>  }
>  
>  static int mpls_forward(struct sk_buff *skb, struct net_device *dev,
> @@ -210,7 +229,7 @@ static int mpls_forward(struct sk_buff *skb, struct net_device *dev,
>  	skb->protocol = htons(ETH_P_MPLS_UC);
>  
>  	if (unlikely(!new_header_size && dec.bos)) {
> -		if (!mpls_egress(rt, skb, dec))
> +		if (!mpls_bos_egress(rt, skb, dec))
>  			goto drop;
>  	} else if (rt->rt_unlabeled) {
>  		/* Labeled traffic destined to unlabeled peer should
> @@ -253,16 +272,17 @@ static const struct nla_policy rtm_mpls_policy[RTA_MAX+1] = {
>  };
>  
>  struct mpls_route_config {
> -	u32		rc_protocol;
> -	u32		rc_ifindex;
> -	u16		rc_via_table;
> -	u16		rc_via_alen;
> -	u8		rc_via[MAX_VIA_ALEN];
> -	u32		rc_label;
> -	u32		rc_output_labels;
> -	u32		rc_output_label[MAX_NEW_LABELS];
> -	u32		rc_nlflags;
> -	struct nl_info	rc_nlinfo;
> +	u32			rc_protocol;
> +	u32			rc_ifindex;
> +	u16			rc_via_table;
> +	u16			rc_via_alen;
> +	u8			rc_via[MAX_VIA_ALEN];
> +	u32			rc_label;
> +	u32			rc_output_labels;
> +	u32			rc_output_label[MAX_NEW_LABELS];
> +	u32			rc_nlflags;
> +	enum mpls_payload_type	rc_payload_type;
> +	struct nl_info		rc_nlinfo;
>  };
>  
>  static struct mpls_route *mpls_rt_alloc(size_t alen)
> @@ -413,6 +433,7 @@ static int mpls_route_add(struct mpls_route_config *cfg)
>  	}
>  	rt->rt_protocol = cfg->rc_protocol;
>  	RCU_INIT_POINTER(rt->rt_dev, dev);
> +	rt->rt_payload_type = cfg->rc_payload_type;
>  	rt->rt_via_table = cfg->rc_via_table;
>  	memcpy(rt->rt_via, cfg->rc_via, cfg->rc_via_alen);
>  
> @@ -948,6 +969,7 @@ static int resize_platform_label_table(struct net *net, size_t limit)
>  			goto nort0;
>  		RCU_INIT_POINTER(rt0->rt_dev, lo);
>  		rt0->rt_protocol = RTPROT_KERNEL;
> +		rt0->rt_payload_type = MPT_IPV4;
>  		rt0->rt_via_table = NEIGH_LINK_TABLE;
>  		memcpy(rt0->rt_via, lo->dev_addr, lo->addr_len);
>  	}
> @@ -958,6 +980,7 @@ static int resize_platform_label_table(struct net *net, size_t limit)
>  			goto nort2;
>  		RCU_INIT_POINTER(rt2->rt_dev, lo);
>  		rt2->rt_protocol = RTPROT_KERNEL;
> +		rt2->rt_payload_type = MPT_IPV6;
>  		rt2->rt_via_table = NEIGH_LINK_TABLE;
>  		memcpy(rt2->rt_via, lo->dev_addr, lo->addr_len);
>  	}

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH net-next v2 3/5] mpls: Differentiate implicit-null and unlabeled neighbours
  2015-03-22 19:49     ` Eric W. Biederman
@ 2015-03-22 21:06       ` Eric W. Biederman
  2015-03-23 11:47         ` Robert Shearman
  0 siblings, 1 reply; 68+ messages in thread
From: Eric W. Biederman @ 2015-03-22 21:06 UTC (permalink / raw)
  To: Robert Shearman; +Cc: davem, netdev

ebiederm@xmission.com (Eric W. Biederman) writes:

> Robert Shearman <rshearma@brocade.com> writes:
>
>> The control plane can advertise labels for neighbours that don't have
>> an outgoing label. RFC 3032 s3.22 states that either the remaining
>> labels should be popped (if the control plane can determine that it's
>> safe to do so, which in light of MPLS-VPN, RFC 4364, is never the case
>> now) or that the packet should be discarded.
>
> I can not figure out what you are referring to.  There is no section 3.2
> in RFC3022.

I have found it.  That is is RFC3021 Section 3.22.  This is something
the code already does.  If the label can not be looked up with
mpls_route_input_rcu the packet is dropped.

Beyond that I believe the rest of my comments still stand.  If you want
to do this explicitly some form of explicit blackhole route needs to be
supported.  Either just allowing a route to be configured with no output
device or an explicit RTN_BLACKHOLE route.

>> Therefore, if the peer is unlabeled and the last label wasn't popped
>> then drop the packet. The peer being unlabeled is signalled by an
>> empty label stack. However, implicit-null still needs to be supported
>> (i.e. penultimate hop popping) where the incoming label is popped and
>> no labels are put on and the packet can still go out labeled with the
>> unpopped part of the stack. This is achieved by the control plane
>> specifying a label stack consisting of the single special
>> implicit-null value.
>
> As I understand it you want to handle the case for a label for which
> there is no next hop, and the packet should be black-holed.
>
> In struct mpls_route such routes are currently represented by routes
> that have no network device.  And in rtnetlink should be represented
> with routes of type RTN_BLACKHOLE which I do not currently support
> parsing.  But that should be simple enough to correc.t
>
> With respect to Implicit NULL it should be an error to accept a route
> that has an RTA_NEWDST that includes an implicit NULL.
>
> The rtnetlink is not ldp nor should it have ldp semantics and be made
> complicated by those semantics.
>
> The semantics of RTA_NEWDST are the labels to push on after the top most
> label has been popped off.  I see no reason to include other mechanisms
> into that processing when it is easy enough to add or tweak other
> attributes to have those semantics.
>
> Certainly it is not something that I think is worth special casing on
> the fast path in mpls_forward.
>
>> Cc: "Eric W. Biederman" <ebiederm@xmission.com>
>> Signed-off-by: Robert Shearman <rshearma@brocade.com>
>> ---
>>  net/mpls/af_mpls.c | 21 +++++++++++++++++----
>>  1 file changed, 17 insertions(+), 4 deletions(-)
>>
>> diff --git a/net/mpls/af_mpls.c b/net/mpls/af_mpls.c
>> index bf3459a..e3586a7 100644
>> --- a/net/mpls/af_mpls.c
>> +++ b/net/mpls/af_mpls.c
>> @@ -28,7 +28,8 @@ struct mpls_route { /* next hop label forwarding entry */
>>  	struct rcu_head		rt_rcu;
>>  	u32			rt_label[MAX_NEW_LABELS];
>>  	u8			rt_protocol; /* routing protocol that set this entry */
>> -	u8			rt_labels;
>> +	u8                      rt_unlabeled : 1;
>> +	u8			rt_labels : 7;
>>  	u8			rt_via_alen;
>>  	u8			rt_via_table;
>>  	u8			rt_via[0];
>> @@ -201,6 +202,11 @@ static int mpls_forward(struct sk_buff *skb, struct net_device *dev,
>>  	if (unlikely(!new_header_size && dec.bos)) {
>>  		if (!mpls_egress(rt, skb, dec))
>>  			goto drop;
>> +	} else if (rt->rt_unlabeled) {
>> +		/* Labeled traffic destined to unlabeled peer should
>> +		 * be discarded
>> +		 */
>> +		goto drop;
>>  	} else {
>>  		bool bos;
>>  		int i;
>> @@ -385,9 +391,16 @@ static int mpls_route_add(struct mpls_route_config *cfg)
>>  	if (!rt)
>>  		goto errout;
>>  
>> -	rt->rt_labels = cfg->rc_output_labels;
>> -	for (i = 0; i < rt->rt_labels; i++)
>> -		rt->rt_label[i] = cfg->rc_output_label[i];
>> +	if (cfg->rc_output_labels == 1 &&
>> +	    cfg->rc_output_label[0] == LABEL_IMPLICIT_NULL) {
>> +		rt->rt_labels = 0;
>> +	} else {
>> +		rt->rt_labels = cfg->rc_output_labels;
>> +		for (i = 0; i < rt->rt_labels; i++)
>> +			rt->rt_label[i] = cfg->rc_output_label[i];
>> +		if (!rt->rt_labels)
>> +			rt->rt_unlabeled = true;
>> +	}
>>  	rt->rt_protocol = cfg->rc_protocol;
>>  	RCU_INIT_POINTER(rt->rt_dev, dev);
>>  	rt->rt_via_table = cfg->rc_via_table;

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH net-next v2 2/5] mpls: Remove incorrect PHP comment
  2015-03-22 19:12     ` Eric W. Biederman
@ 2015-03-23 11:32       ` Robert Shearman
  2015-03-23 18:16         ` Eric W. Biederman
  0 siblings, 1 reply; 68+ messages in thread
From: Robert Shearman @ 2015-03-23 11:32 UTC (permalink / raw)
  To: Eric W. Biederman; +Cc: davem, netdev

On 22/03/15 19:12, Eric W. Biederman wrote:
> Robert Shearman <rshearma@brocade.com> writes:
>
>> Popping the last label on the stack does not necessarily imply
>> performing penultimate hop popping. There is no reason why this
>> couldn't be the last hop in the network, so remove the comment.
>
> So this change I will disagree with.
>
> What the code implements is Penultimate hop popping.  Even if you send
> the packets over loopback that is what the code is doing.

No, RFC3031 s3.16 (https://tools.ietf.org/html/rfc3031#page-18) talks in 
terms of LSRs (label switch routers), not passes through the forwarding 
code.

> This is relevant because I think the code may actually be wrong in the
> local reception case.  By preforming penultimate hop popping and
> receving the code on loopback I think this code allows bypassing
> iptables rules that apply to incoming ip packets.  Certainly there is a
> loss of information as to which hardware interface the packet came in on
> that it may be desirable to correct.

Indeed, but network operators may well want to apply different rules to 
traffic coming in as IP versus traffic coming in as MPLS.

This may well merit a comment of its own, but this isn't directly 
relevant to the comment I'm removing.

Thanks,
Rob

>
> Eric
>
>
>> Cc: "Eric W. Biederman" <ebiederm@xmission.com>
>> Signed-off-by: Robert Shearman <rshearma@brocade.com>
>> ---
>>   net/mpls/af_mpls.c | 1 -
>>   1 file changed, 1 deletion(-)
>>
>> diff --git a/net/mpls/af_mpls.c b/net/mpls/af_mpls.c
>> index 0d6763a..bf3459a 100644
>> --- a/net/mpls/af_mpls.c
>> +++ b/net/mpls/af_mpls.c
>> @@ -199,7 +199,6 @@ static int mpls_forward(struct sk_buff *skb, struct net_device *dev,
>>   	skb->protocol = htons(ETH_P_MPLS_UC);
>>
>>   	if (unlikely(!new_header_size && dec.bos)) {
>> -		/* Penultimate hop popping */
>>   		if (!mpls_egress(rt, skb, dec))
>>   			goto drop;
>>   	} else {

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH net-next v2 3/5] mpls: Differentiate implicit-null and unlabeled neighbours
  2015-03-22 21:06       ` Eric W. Biederman
@ 2015-03-23 11:47         ` Robert Shearman
  0 siblings, 0 replies; 68+ messages in thread
From: Robert Shearman @ 2015-03-23 11:47 UTC (permalink / raw)
  To: Eric W. Biederman; +Cc: davem, netdev

On 22/03/15 21:06, Eric W. Biederman wrote:
> ebiederm@xmission.com (Eric W. Biederman) writes:
>
>> Robert Shearman <rshearma@brocade.com> writes:
>>
>>> The control plane can advertise labels for neighbours that don't have
>>> an outgoing label. RFC 3032 s3.22 states that either the remaining
>>> labels should be popped (if the control plane can determine that it's
>>> safe to do so, which in light of MPLS-VPN, RFC 4364, is never the case
>>> now) or that the packet should be discarded.
>>
>> I can not figure out what you are referring to.  There is no section 3.2
>> in RFC3022.
>
> I have found it.  That is is RFC3021 Section 3.22.  This is something
> the code already does.  If the label can not be looked up with
> mpls_route_input_rcu the packet is dropped.

No, the existing code handles the lack of an incoming label. s3.22 is 
stating what should be done with the lack of an outgoing label.

> Beyond that I believe the rest of my comments still stand.  If you want
> to do this explicitly some form of explicit blackhole route needs to be
> supported.  Either just allowing a route to be configured with no output
> device or an explicit RTN_BLACKHOLE route.

No, that isn't going to address the problem this patch solves.

>
>>> Therefore, if the peer is unlabeled and the last label wasn't popped
>>> then drop the packet. The peer being unlabeled is signalled by an
>>> empty label stack. However, implicit-null still needs to be supported
>>> (i.e. penultimate hop popping) where the incoming label is popped and
>>> no labels are put on and the packet can still go out labeled with the
>>> unpopped part of the stack. This is achieved by the control plane
>>> specifying a label stack consisting of the single special
>>> implicit-null value.
>>
>> As I understand it you want to handle the case for a label for which
>> there is no next hop, and the packet should be black-holed.
>>
>> In struct mpls_route such routes are currently represented by routes
>> that have no network device.  And in rtnetlink should be represented
>> with routes of type RTN_BLACKHOLE which I do not currently support
>> parsing.  But that should be simple enough to correc.t
>>
>> With respect to Implicit NULL it should be an error to accept a route
>> that has an RTA_NEWDST that includes an implicit NULL.
>>
>> The rtnetlink is not ldp nor should it have ldp semantics and be made
>> complicated by those semantics.

This isn't specific to LDP - it is used by MP-BGP as well, or indeed 
would be perfectly valid to be specified in static configuration. As per 
RFC3031 s4.1.5 (https://tools.ietf.org/html/rfc3031#section-4.1.5) this 
signals that penultimate hop popping should be done, as opposed to 
dropping the packet if it would go out as MPLS (s3.22).

Thanks,
Rob

>> The semantics of RTA_NEWDST are the labels to push on after the top most
>> label has been popped off.  I see no reason to include other mechanisms
>> into that processing when it is easy enough to add or tweak other
>> attributes to have those semantics.
>>
>> Certainly it is not something that I think is worth special casing on
>> the fast path in mpls_forward.

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH net-next v2 4/5] mpls: Per-device enabling of packet forwarding
  2015-03-22 20:02     ` Eric W. Biederman
  2015-03-22 20:34       ` Eric W. Biederman
@ 2015-03-23 13:10       ` Robert Shearman
  1 sibling, 0 replies; 68+ messages in thread
From: Robert Shearman @ 2015-03-23 13:10 UTC (permalink / raw)
  To: Eric W. Biederman; +Cc: davem, netdev

On 22/03/15 20:02, Eric W. Biederman wrote:
> Robert Shearman <rshearma@brocade.com> writes:
>
>> An MPLS network is a single trust domain where the edges must be in
>> control of what labels make their way into the core. The simplest way
>> of ensuring for the edge device to always impose the labels, and not
>> allow forward labeled traffic from untrusted neighbours. This is
>> achieved by allowing a per-device configuration of whether MPLS
>> traffic received over that interface should be forwarded or not.
>>
>> To be secure by default, MPLS is now intially disabled on all
>> interfaces (except the loopback) until explicitly enabled and no
>> global option is provided to change the default. Whilst this differs
>> from other protocols (e.g. IPv6), network operators are used to
>> explicitly enabling MPLS forwarding on interfaces, and with the number
>> of links to the MPLS core typically fairly low this doesn't present
>> too much of a burden on operators.
>
> Overall this patch looks like the correct direction to go.
>
> And a default disable is the right way to go for new features, that way
> even if the code is compiled in people don't get surprised by new
> behavior when they upgrade kernels.
>
> It would be very nice if the check for ARPHRD types was moved from
> mpls_route_add to mpls_add_dev.  Which would save memory and complexity
> when mpls is not supported on a network device type.

That check is for output, rather than input which is what this patch 
affects. If this affected both, or there was a separate knob for the 
output side then I'd agree with you.

Thanks,
Rob

>
> Eric
>
>> Cc: "Eric W. Biederman" <ebiederm@xmission.com>
>> Signed-off-by: Robert Shearman <rshearma@brocade.com>
>> ---
>>   Documentation/networking/mpls-sysctl.txt |   9 +++
>>   include/linux/netdevice.h                |   4 ++
>>   net/mpls/af_mpls.c                       | 115 ++++++++++++++++++++++++++++++-
>>   net/mpls/internal.h                      |   6 ++
>>   4 files changed, 133 insertions(+), 1 deletion(-)
>>
>> diff --git a/Documentation/networking/mpls-sysctl.txt b/Documentation/networking/mpls-sysctl.txt
>> index 639ddf0..f48772c 100644
>> --- a/Documentation/networking/mpls-sysctl.txt
>> +++ b/Documentation/networking/mpls-sysctl.txt
>> @@ -18,3 +18,12 @@ platform_labels - INTEGER
>>
>>   	Possible values: 0 - 1048575
>>   	Default: 0
>> +
>> +conf/<interface>/forwarding - BOOL
>> +	Forward packets received on this interface.
>> +
>> +	If disabled, packets will be discarded without further
>> +	processing.
>> +
>> +	0 - disabled (default)
>> +	not 0 - enabled
>> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
>> index 76951c5..ee4ca06 100644
>> --- a/include/linux/netdevice.h
>> +++ b/include/linux/netdevice.h
>> @@ -60,6 +60,7 @@ struct phy_device;
>>   struct wireless_dev;
>>   /* 802.15.4 specific */
>>   struct wpan_dev;
>> +struct mpls_dev;
>>
>>   void netdev_set_default_ethtool_ops(struct net_device *dev,
>>   				    const struct ethtool_ops *ops);
>> @@ -1615,6 +1616,9 @@ struct net_device {
>>   	void			*ax25_ptr;
>>   	struct wireless_dev	*ieee80211_ptr;
>>   	struct wpan_dev		*ieee802154_ptr;
>> +#if IS_ENABLED(CONFIG_MPLS_ROUTING)
>> +	struct mpls_dev __rcu	*mpls_ptr;
>> +#endif
>>
>>   /*
>>    * Cache lines mostly used on receive path (including eth_type_trans())
>> diff --git a/net/mpls/af_mpls.c b/net/mpls/af_mpls.c
>> index e3586a7..14c7e76 100644
>> --- a/net/mpls/af_mpls.c
>> +++ b/net/mpls/af_mpls.c
>> @@ -54,6 +54,11 @@ static struct mpls_route *mpls_route_input_rcu(struct net *net, unsigned index)
>>   	return rt;
>>   }
>>
>> +static inline struct mpls_dev *mpls_dev_get(const struct net_device *dev)
>> +{
>> +	return rcu_dereference_rtnl(dev->mpls_ptr);
>> +}
>> +
>>   static bool mpls_output_possible(const struct net_device *dev)
>>   {
>>   	return dev && (dev->flags & IFF_UP) && netif_carrier_ok(dev);
>> @@ -137,6 +142,7 @@ static int mpls_forward(struct sk_buff *skb, struct net_device *dev,
>>   	struct mpls_route *rt;
>>   	struct mpls_entry_decoded dec;
>>   	struct net_device *out_dev;
>> +	struct mpls_dev *mdev;
>>   	unsigned int hh_len;
>>   	unsigned int new_header_size;
>>   	unsigned int mtu;
>> @@ -144,6 +150,10 @@ static int mpls_forward(struct sk_buff *skb, struct net_device *dev,
>>
>>   	/* Careful this entire function runs inside of an rcu critical section */
>>
>> +	mdev = mpls_dev_get(dev);
>> +	if (!mdev || !mdev->fwd_enabled)
>> +		goto drop;
>> +
>>   	if (skb->pkt_type != PACKET_HOST)
>>   		goto drop;
>>
>> @@ -440,10 +450,96 @@ errout:
>>   	return err;
>>   }
>>
>> +#define MPLS_PERDEV_SYSCTL_OFFSET(field)	\
>> +	(&((struct mpls_dev *)0)->field)
>> +
>> +static const struct ctl_table mpls_dev_table[] = {
>> +	{
>> +		.procname	= "forwarding",
>> +		.maxlen		= sizeof(int),
>> +		.mode		= 0644,
>> +		.proc_handler	= proc_dointvec,
>> +		.data		= MPLS_PERDEV_SYSCTL_OFFSET(fwd_enabled),
>> +	},
>> +	{ }
>> +};
>> +
>> +static int mpls_dev_sysctl_register(struct net_device *dev,
>> +				    struct mpls_dev *mdev)
>> +{
>> +	char path[sizeof("net/mpls/conf/") + IFNAMSIZ];
>> +	struct ctl_table *table;
>> +	int i;
>> +
>> +	table = kmemdup(&mpls_dev_table, sizeof(mpls_dev_table), GFP_KERNEL);
>> +	if (!table)
>> +		goto out;
>> +
>> +	/* Table data contains only offsets relative to the base of
>> +	 * the mdev at this point, so make them absolute.
>> +	 */
>> +	for (i = 0; i < ARRAY_SIZE(mpls_dev_table); i++)
>> +		table[i].data = (char *)mdev + (uintptr_t)table[i].data;
>> +
>> +	snprintf(path, sizeof(path), "net/mpls/conf/%s", dev->name);
>> +
>> +	mdev->sysctl = register_net_sysctl(dev_net(dev), path, table);
>> +	if (!mdev->sysctl)
>> +		goto free;
>> +
>> +	return 0;
>> +
>> +free:
>> +	kfree(table);
>> +out:
>> +	return -ENOBUFS;
>> +}
>> +
>> +static void mpls_dev_sysctl_unregister(struct mpls_dev *mdev)
>> +{
>> +	struct ctl_table *table;
>> +
>> +	table = mdev->sysctl->ctl_table_arg;
>> +	unregister_net_sysctl_table(mdev->sysctl);
>> +	kfree(table);
>> +}
>> +
>> +static struct mpls_dev *mpls_add_dev(struct net_device *dev)
>> +{
>> +	struct mpls_dev *mdev;
>> +	int err = -ENOMEM;
>> +
>> +	ASSERT_RTNL();
>> +
>> +	mdev = kzalloc(sizeof(*mdev), GFP_KERNEL);
>> +	if (!mdev)
>> +		return ERR_PTR(err);
>> +
>> +	/* Enable MPLS by default on loopback devices, since this
>> +	 * doesn't represent a security boundary and is required for the
>> +	 * lookup of inner labels for LSPs terminating on this router.
>> +	 */
>> +	if (dev->flags & IFF_LOOPBACK)
>> +		mdev->fwd_enabled = 1;
>> +
>> +	err = mpls_dev_sysctl_register(dev, mdev);
>> +	if (err)
>> +		goto free;
>> +
>> +	rcu_assign_pointer(dev->mpls_ptr, mdev);
>> +
>> +	return mdev;
>> +
>> +free:
>> +	kfree(mdev);
>> +	return ERR_PTR(err);
>> +}
>> +
>>   static void mpls_ifdown(struct net_device *dev)
>>   {
>>   	struct mpls_route __rcu **platform_label;
>>   	struct net *net = dev_net(dev);
>> +	struct mpls_dev *mdev;
>>   	unsigned index;
>>
>>   	platform_label = rtnl_dereference(net->mpls.platform_label);
>> @@ -455,14 +551,31 @@ static void mpls_ifdown(struct net_device *dev)
>>   			continue;
>>   		rt->rt_dev = NULL;
>>   	}
>> +
>> +	mdev = mpls_dev_get(dev);
>> +	if (!mdev)
>> +		return;
>> +
>> +	mpls_dev_sysctl_unregister(mdev);
>> +
>> +	RCU_INIT_POINTER(dev->mpls_ptr, NULL);
>> +
>> +	kfree(mdev);
>>   }
>>
>>   static int mpls_dev_notify(struct notifier_block *this, unsigned long event,
>>   			   void *ptr)
>>   {
>>   	struct net_device *dev = netdev_notifier_info_to_dev(ptr);
>> +	struct mpls_dev *mdev;
>>
>>   	switch(event) {
>> +	case NETDEV_REGISTER:
>> +		mdev = mpls_add_dev(dev);
>> +		if (IS_ERR(mdev))
>> +			return notifier_from_errno(PTR_ERR(mdev));
>> +		break;
>> +
>>   	case NETDEV_UNREGISTER:
>>   		mpls_ifdown(dev);
>>   		break;
>> @@ -924,7 +1037,7 @@ static int mpls_platform_labels(struct ctl_table *table, int write,
>>   	return ret;
>>   }
>>
>> -static struct ctl_table mpls_table[] = {
>> +static const struct ctl_table mpls_table[] = {
>>   	{
>>   		.procname	= "platform_labels",
>>   		.data		= NULL,
>> diff --git a/net/mpls/internal.h b/net/mpls/internal.h
>> index 5732283..e676a43 100644
>> --- a/net/mpls/internal.h
>> +++ b/net/mpls/internal.h
>> @@ -23,6 +23,12 @@ struct mpls_entry_decoded {
>>   	u8 bos;
>>   };
>>
>> +struct mpls_dev {
>> +	int			fwd_enabled;
>> +
>> +	struct ctl_table_header *sysctl;
>> +};
>> +
>>   struct sk_buff;
>>
>>   static inline struct mpls_shim_hdr *mpls_hdr(const struct sk_buff *skb)

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH net-next v2 4/5] mpls: Per-device enabling of packet forwarding
  2015-03-22 20:34       ` Eric W. Biederman
@ 2015-03-23 13:42         ` Robert Shearman
  0 siblings, 0 replies; 68+ messages in thread
From: Robert Shearman @ 2015-03-23 13:42 UTC (permalink / raw)
  To: Eric W. Biederman; +Cc: davem, netdev

On 22/03/15 20:34, Eric W. Biederman wrote:
> ebiederm@xmission.com (Eric W. Biederman) writes:
>
>> Robert Shearman <rshearma@brocade.com> writes:
>>
>>> An MPLS network is a single trust domain where the edges must be in
>>> control of what labels make their way into the core. The simplest way
>>> of ensuring for the edge device to always impose the labels, and not
>>> allow forward labeled traffic from untrusted neighbours. This is
>>> achieved by allowing a per-device configuration of whether MPLS
>>> traffic received over that interface should be forwarded or not.
>>>
>>> To be secure by default, MPLS is now intially disabled on all
>>> interfaces (except the loopback) until explicitly enabled and no
>>> global option is provided to change the default. Whilst this differs
>>> from other protocols (e.g. IPv6), network operators are used to
>>> explicitly enabling MPLS forwarding on interfaces, and with the number
>>> of links to the MPLS core typically fairly low this doesn't present
>>> too much of a burden on operators.
>>
>> Overall this patch looks like the correct direction to go.
>>
>> And a default disable is the right way to go for new features, that way
>> even if the code is compiled in people don't get surprised by new
>> behavior when they upgrade kernels.
>>
>> It would be very nice if the check for ARPHRD types was moved from
>> mpls_route_add to mpls_add_dev.  Which would save memory and complexity
>> when mpls is not supported on a network device type.
>
> There is also a question of do we want "forwarding" to be the parameter
> we are controlling.  The other option is not "forwarding" but mpls
> "enable".
>
> Completely disabling mpls on a device might be too strong as it would
> presumably work for output as well as input.
>
> Forwarding at least for ipv4 and ipv6 has the semantic that you can
> still accept packets that are routed to yourself, which your
> implementation of forwarding does not.

Arguably, the implementation here doesn't implement routes to itself 
(i.e. you'd need a reswitch to deaggregate first). However, I can see 
the argument that this might be confusing for users more familiar with 
ipv4/ipv6 semantics.

> So I expect what we actually want here is either "enable" or two
> knobs "input" and "output".

The semantics of enable or output become confusing because would they 
control whether MPLS routes would be allowed out of that interface, or 
would they control whether traffic could go out as MPLS (in the light of 
patch 3, i.e. allow only IP)? I think it would have to be the latter to 
be useful. There's also the question of what the default value should be 
(i.e. it's safe for it to be enabled by default, unlike on input).

 From a practical perspective, I don't really see the need at the moment 
for a flag to control whether traffic can be sent out of that interfaces 
or not. The user can already control that by installing or not 
installing routes out of that interface as desired.

Therefore, to keep the code simple I propose that I rename the parameter 
to "input" as you suggest and someone else can implement "output" if 
they find a use case for it.

Thanks,
Rob

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH net-next v2 5/5] mpls: Allow payload type to be associated with label routes
  2015-03-22 20:56     ` Eric W. Biederman
@ 2015-03-23 14:02       ` Robert Shearman
  0 siblings, 0 replies; 68+ messages in thread
From: Robert Shearman @ 2015-03-23 14:02 UTC (permalink / raw)
  To: Eric W. Biederman; +Cc: davem, netdev

On 22/03/15 20:56, Eric W. Biederman wrote:
> Robert Shearman <rshearma@brocade.com> writes:
>
>> RFC 4182 s2 states that if an IPv4 Explicit NULL label is the only
>> label on the stack, then after popping the resulting packet must be
>> treated as a IPv4 packet and forwarded based on the IPv4 header. The
>> same is true for IPv6 Explicit NULL with an IPv6 packet following.
>>
>> Therefore, when installing the IPv4/IPv6 Explicit NULL label routes,
>> add an attribute that specifies the expected payload type for use at
>> forwarding time for determining the type of the encapsulated packet
>> instead of inspecting the first nibble of the packet.
>
> So this patch is not wrong.  And it at a practical level it is a good
> idea to enforce ipv4 when the ipv4 explicit null label is present
> and similarly with ipv6.
>
> I do have some quibbles.
>
> First I want to point out that in RFC3032 section 2.2 talks about using
> a label in combination of with the packets contents to figure out the
> type of packet that is being transmitted.  IPv4 and IPv6 do count as a
> set of network layer protocols that can be distinguished by inspection
> of the network layer header.

I'm confused why you feel this is a quibble. This patch allows this case 
and even documents that this can be done:

 >> +	MPT_UNSPEC, /* IPv4 or IPv6 */

I haven't added any warnings or barriers to using this even with it 
being orthogonal to the direction all the other known MPLS stacks have 
gone in, as we discussed in a previous thread.

> Changing mpls_egress to mpls_bos_egress bothers me a little, because it
> seems redundant.  But I can see an argument for that name change.
>
> I think it would be cleaner if we set MPT_IPV4 = 4 and MPT_IPV6 = 6.
> which would remove a switch statement mpls_pkt_determine_af.

Ok.

> You delete my big fat comment referring people to how packets are
> encoded in mpls.  That seems unfortunate, because it can be easy to get
> lost in the MPLS rfcs, and I am certain someone will want to do more
> than support IPv4 and IPv6.

Yes, I deleted the comment because it refers to determining the type of 
packet using the first nibble for the pseudo-wire with control-word 
case, which as we discussed in a previous thread is contrary to the 
intention of the author of the RFC draft that defines it. I can 
certainly keep the references to the RFCs around though.

>
> Given the number of pseudo wire types I do believe that 3 bits is going
> to be too small to encode everything going forward.

I can steal another bit from the number of labels if you'd prefer, but 
if you're suggesting moving this out to a full 8-bit field then I don't 
see the need to over-engineer this and use more memory given that this 
can easily be changed going forward.

Thanks,
Rob

>
>> Cc: "Eric W. Biederman" <ebiederm@xmission.com>
>> Signed-off-by: Robert Shearman <rshearma@brocade.com>
>> ---
>>   net/mpls/af_mpls.c | 87 ++++++++++++++++++++++++++++++++++--------------------
>>   1 file changed, 55 insertions(+), 32 deletions(-)
>>
>> diff --git a/net/mpls/af_mpls.c b/net/mpls/af_mpls.c
>> index 14c7e76..653bae1 100644
>> --- a/net/mpls/af_mpls.c
>> +++ b/net/mpls/af_mpls.c
>> @@ -23,13 +23,20 @@
>>   /* This maximum ha length copied from the definition of struct neighbour */
>>   #define MAX_VIA_ALEN (ALIGN(MAX_ADDR_LEN, sizeof(unsigned long)))
>>
>> +enum mpls_payload_type {
>> +	MPT_UNSPEC, /* IPv4 or IPv6 */
>> +	MPT_IPV4,
>> +	MPT_IPV6,
>> +};
>> +
>>   struct mpls_route { /* next hop label forwarding entry */
>>   	struct net_device __rcu *rt_dev;
>>   	struct rcu_head		rt_rcu;
>>   	u32			rt_label[MAX_NEW_LABELS];
>>   	u8			rt_protocol; /* routing protocol that set this entry */
>>   	u8                      rt_unlabeled : 1;
>> -	u8			rt_labels : 7;
>> +	u8                      rt_payload_type : 3;
>> +	u8			rt_labels : 4;
>>   	u8			rt_via_alen;
>>   	u8			rt_via_table;
>>   	u8			rt_via[0];
>> @@ -87,19 +94,24 @@ static bool mpls_pkt_too_big(const struct sk_buff *skb, unsigned int mtu)
>>   	return true;
>>   }
>>
>> -static bool mpls_egress(struct mpls_route *rt, struct sk_buff *skb,
>> -			struct mpls_entry_decoded dec)
>> +static enum mpls_payload_type mpls_pkt_determine_af(struct sk_buff *skb)
>>   {
>> -	/* RFC4385 and RFC5586 encode other packets in mpls such that
>> -	 * they don't conflict with the ip version number, making
>> -	 * decoding by examining the ip version correct in everything
>> -	 * except for the strangest cases.
>> -	 *
>> -	 * The strange cases if we choose to support them will require
>> -	 * manual configuration.
>> -	 */
>> -	struct iphdr *hdr4;
>> -	bool success = true;
>> +	struct iphdr *hdr4 = ip_hdr(skb);
>> +
>> +	switch (hdr4->version) {
>> +	case 4:
>> +		return MPT_IPV4;
>> +	case 6:
>> +		return MPT_IPV6;
>> +	}
>> +
>> +	return MPT_UNSPEC;
>> +}
>> +
>> +static bool mpls_bos_egress(struct mpls_route *rt, struct sk_buff *skb,
>> +			    struct mpls_entry_decoded dec)
>> +{
>> +	enum mpls_payload_type payload_type;
>>
>>   	/* The IPv4 code below accesses through the IPv4 header
>>   	 * checksum, which is 12 bytes into the packet.
>> @@ -114,24 +126,31 @@ static bool mpls_egress(struct mpls_route *rt, struct sk_buff *skb,
>>   	if (!pskb_may_pull(skb, 12))
>>   		return false;
>>
>> -	/* Use ip_hdr to find the ip protocol version */
>> -	hdr4 = ip_hdr(skb);
>> -	if (hdr4->version == 4) {
>> +	payload_type = rt->rt_payload_type;
>> +	if (payload_type == MPT_UNSPEC)
>> +		payload_type = mpls_pkt_determine_af(skb);
>> +
>> +	switch (payload_type) {
>> +	case MPT_IPV4: {
>> +		struct iphdr *hdr4 = ip_hdr(skb);
>>   		skb->protocol = htons(ETH_P_IP);
>>   		csum_replace2(&hdr4->check,
>>   			      htons(hdr4->ttl << 8),
>>   			      htons(dec.ttl << 8));
>>   		hdr4->ttl = dec.ttl;
>> +		return true;
>>   	}
>> -	else if (hdr4->version == 6) {
>> +	case MPT_IPV6: {
>>   		struct ipv6hdr *hdr6 = ipv6_hdr(skb);
>>   		skb->protocol = htons(ETH_P_IPV6);
>>   		hdr6->hop_limit = dec.ttl;
>> +		return true;
>>   	}
>> -	else
>> -		/* version 0 and version 1 are used by pseudo wires */
>> -		success = false;
>> -	return success;
>> +	case MPT_UNSPEC:
>> +		break;
>> +	}
>> +
>> +	return false;
>>   }
>>
>>   static int mpls_forward(struct sk_buff *skb, struct net_device *dev,
>> @@ -210,7 +229,7 @@ static int mpls_forward(struct sk_buff *skb, struct net_device *dev,
>>   	skb->protocol = htons(ETH_P_MPLS_UC);
>>
>>   	if (unlikely(!new_header_size && dec.bos)) {
>> -		if (!mpls_egress(rt, skb, dec))
>> +		if (!mpls_bos_egress(rt, skb, dec))
>>   			goto drop;
>>   	} else if (rt->rt_unlabeled) {
>>   		/* Labeled traffic destined to unlabeled peer should
>> @@ -253,16 +272,17 @@ static const struct nla_policy rtm_mpls_policy[RTA_MAX+1] = {
>>   };
>>
>>   struct mpls_route_config {
>> -	u32		rc_protocol;
>> -	u32		rc_ifindex;
>> -	u16		rc_via_table;
>> -	u16		rc_via_alen;
>> -	u8		rc_via[MAX_VIA_ALEN];
>> -	u32		rc_label;
>> -	u32		rc_output_labels;
>> -	u32		rc_output_label[MAX_NEW_LABELS];
>> -	u32		rc_nlflags;
>> -	struct nl_info	rc_nlinfo;
>> +	u32			rc_protocol;
>> +	u32			rc_ifindex;
>> +	u16			rc_via_table;
>> +	u16			rc_via_alen;
>> +	u8			rc_via[MAX_VIA_ALEN];
>> +	u32			rc_label;
>> +	u32			rc_output_labels;
>> +	u32			rc_output_label[MAX_NEW_LABELS];
>> +	u32			rc_nlflags;
>> +	enum mpls_payload_type	rc_payload_type;
>> +	struct nl_info		rc_nlinfo;
>>   };
>>
>>   static struct mpls_route *mpls_rt_alloc(size_t alen)
>> @@ -413,6 +433,7 @@ static int mpls_route_add(struct mpls_route_config *cfg)
>>   	}
>>   	rt->rt_protocol = cfg->rc_protocol;
>>   	RCU_INIT_POINTER(rt->rt_dev, dev);
>> +	rt->rt_payload_type = cfg->rc_payload_type;
>>   	rt->rt_via_table = cfg->rc_via_table;
>>   	memcpy(rt->rt_via, cfg->rc_via, cfg->rc_via_alen);
>>
>> @@ -948,6 +969,7 @@ static int resize_platform_label_table(struct net *net, size_t limit)
>>   			goto nort0;
>>   		RCU_INIT_POINTER(rt0->rt_dev, lo);
>>   		rt0->rt_protocol = RTPROT_KERNEL;
>> +		rt0->rt_payload_type = MPT_IPV4;
>>   		rt0->rt_via_table = NEIGH_LINK_TABLE;
>>   		memcpy(rt0->rt_via, lo->dev_addr, lo->addr_len);
>>   	}
>> @@ -958,6 +980,7 @@ static int resize_platform_label_table(struct net *net, size_t limit)
>>   			goto nort2;
>>   		RCU_INIT_POINTER(rt2->rt_dev, lo);
>>   		rt2->rt_protocol = RTPROT_KERNEL;
>> +		rt2->rt_payload_type = MPT_IPV6;
>>   		rt2->rt_via_table = NEIGH_LINK_TABLE;
>>   		memcpy(rt2->rt_via, lo->dev_addr, lo->addr_len);
>>   	}

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH net-next v2 2/5] mpls: Remove incorrect PHP comment
  2015-03-23 11:32       ` Robert Shearman
@ 2015-03-23 18:16         ` Eric W. Biederman
  2015-03-24 15:18           ` Robert Shearman
  0 siblings, 1 reply; 68+ messages in thread
From: Eric W. Biederman @ 2015-03-23 18:16 UTC (permalink / raw)
  To: Robert Shearman; +Cc: davem, netdev

Robert Shearman <rshearma@brocade.com> writes:

> On 22/03/15 19:12, Eric W. Biederman wrote:
>> Robert Shearman <rshearma@brocade.com> writes:
>>
>>> Popping the last label on the stack does not necessarily imply
>>> performing penultimate hop popping. There is no reason why this
>>> couldn't be the last hop in the network, so remove the comment.
>>
>> So this change I will disagree with.
>>
>> What the code implements is Penultimate hop popping.  Even if you send
>> the packets over loopback that is what the code is doing.
>
> No, RFC3031 s3.16 (https://tools.ietf.org/html/rfc3031#page-18) talks in terms
> of LSRs (label switch routers), not passes through the forwarding
> code.

In very simple terms the code always removes the labels and forwards the
code.  That is by definition penultimate hop popping.  That is all that
is implmeneted in the code today.  And that is what the comment is
noting.

>> This is relevant because I think the code may actually be wrong in the
>> local reception case.  By preforming penultimate hop popping and
>> receving the code on loopback I think this code allows bypassing
>> iptables rules that apply to incoming ip packets.  Certainly there is a
>> loss of information as to which hardware interface the packet came in on
>> that it may be desirable to correct.
>
> Indeed, but network operators may well want to apply different rules to traffic
> coming in as IP versus traffic coming in as MPLS.
>
> This may well merit a comment of its own, but this isn't directly relevant to
> the comment I'm removing.

My point is and what is directly relevant is the case of local delivery
is a hack.  A hack that a pretty strong case can be made that it does
the wrong thing and something we probably should fix before the code
makes it to Linus for 4.1 so the bug does not get cast in stone.

In other words the disparity between the comment and the code indicates
an actual bug, not just a wrong comment.

Eric

>>> Cc: "Eric W. Biederman" <ebiederm@xmission.com>
>>> Signed-off-by: Robert Shearman <rshearma@brocade.com>
>>> ---
>>>   net/mpls/af_mpls.c | 1 -
>>>   1 file changed, 1 deletion(-)
>>>
>>> diff --git a/net/mpls/af_mpls.c b/net/mpls/af_mpls.c
>>> index 0d6763a..bf3459a 100644
>>> --- a/net/mpls/af_mpls.c
>>> +++ b/net/mpls/af_mpls.c
>>> @@ -199,7 +199,6 @@ static int mpls_forward(struct sk_buff *skb, struct net_device *dev,
>>>   	skb->protocol = htons(ETH_P_MPLS_UC);
>>>
>>>   	if (unlikely(!new_header_size && dec.bos)) {
>>> -		/* Penultimate hop popping */
>>>   		if (!mpls_egress(rt, skb, dec))
>>>   			goto drop;
>>>   	} else {

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH net-next v2 2/5] mpls: Remove incorrect PHP comment
  2015-03-23 18:16         ` Eric W. Biederman
@ 2015-03-24 15:18           ` Robert Shearman
  2015-03-24 18:43             ` Vivek Venkatraman
  0 siblings, 1 reply; 68+ messages in thread
From: Robert Shearman @ 2015-03-24 15:18 UTC (permalink / raw)
  To: Eric W. Biederman; +Cc: davem, netdev

On 23/03/15 18:16, Eric W. Biederman wrote:
> Robert Shearman <rshearma@brocade.com> writes:
>
>> On 22/03/15 19:12, Eric W. Biederman wrote:
>>> Robert Shearman <rshearma@brocade.com> writes:
>>>
>>>> Popping the last label on the stack does not necessarily imply
>>>> performing penultimate hop popping. There is no reason why this
>>>> couldn't be the last hop in the network, so remove the comment.
>>>
>>> So this change I will disagree with.
>>>
>>> What the code implements is Penultimate hop popping.  Even if you send
>>> the packets over loopback that is what the code is doing.
>>
>> No, RFC3031 s3.16 (https://tools.ietf.org/html/rfc3031#page-18) talks in terms
>> of LSRs (label switch routers), not passes through the forwarding
>> code.
>
> In very simple terms the code always removes the labels and forwards the
> code.  That is by definition penultimate hop popping.  That is all that
> is implmeneted in the code today.  And that is what the comment is
> noting.
>

I think we might be talking at cross purposes here so I'd like to back 
up a little bit and make sure that we are using the same terminology and 
are talking about the same problem.

In terms of terminology this is what I mean -

PHP means popping the label in the penultimate hop router in the 
*signalled* LSP. Note: There may be more hops beyond this LSP - either 
MPLS using an inner label or payload - such as IP.

UHP means popping the label in the ultimate hop router in the signalled 
LSP. Again there may be further hops that are outside of the LSP.

Orthogonal to this is the relation of the FEC to the UH router.

a) All traffic for this FEC needs to be sent to another router (which is 
beyond this LSP).
b) The FEC requires the UH router to do a payload lookup in order to 
identify where the payload should go. This can be a label lookup on the 
next label in (e.g. where the outer label is a TE tunnel over which 
further LSPs have been setup). Or it can be a payload protocol lookup 
e.g. IP. There are a couple of sub-cases here - the common case is where 
the FEC is for a prefix that is directly connected to the UH - in which 
case the lookup we need is conceptually a neighbour lookup. Another case 
is where the FEC corresponds to a summary prefix that this router has 
advertised to its peer - in which case the lookup is a routing table lookup.
c) The FEC is destined for the UH router and needs to be "received" by 
it. Typically this case can be handled in exactly the same way as B 
above but mention it here for completeness.

Does this terminology work for you?

If it does then I hope you will also agree that PHP is really aimed at 
decreasing the work for the UH router in the cases where the FEC 
corresponds to B above. (And it also allows C to be handled using 
existing e.g. IP path mechanisms). If we want to do UHP then we'd 
require the UH router to first lookup the outer label (which would be 
exp-null) and then an additional protocol lookup.

But there are a couple of cases where you either can't or may not want 
to do PHP.

Firstly with an application like MPLS VPN the UH router is the PE and it 
needs the label in order to identify which routing table to do the (IP) 
payload lookup in.

Secondly, in case A, PHP doesn't offer any benefit. The PH is already 
able to use the incoming label to determine how to forward without 
needing a further lookup. It's a choice, but there are certainly stacks 
out there that will choose to advertise a real label for the FEC so that 
we do UHP here. (I'd actually expect that to be the norm).

Does this description of behaviours work for you?

If we are still together at this point then my last question is why you 
say that "remov[ing] a label and forward[ing] the [packet]" is "by 
definition penultimate-hop popping"?

I'm actually more interested in getting aligned on this than the comment 
itself. I also have some comments on what you say below about local 
delivery but I suspect it will all make more sense once we sort this out.

Thanks,
Rob

>>> This is relevant because I think the code may actually be wrong in the
>>> local reception case.  By preforming penultimate hop popping and
>>> receving the code on loopback I think this code allows bypassing
>>> iptables rules that apply to incoming ip packets.  Certainly there is a
>>> loss of information as to which hardware interface the packet came in on
>>> that it may be desirable to correct.
>>
>> Indeed, but network operators may well want to apply different rules to traffic
>> coming in as IP versus traffic coming in as MPLS.
>>
>> This may well merit a comment of its own, but this isn't directly relevant to
>> the comment I'm removing.
>
> My point is and what is directly relevant is the case of local delivery
> is a hack.  A hack that a pretty strong case can be made that it does
> the wrong thing and something we probably should fix before the code
> makes it to Linus for 4.1 so the bug does not get cast in stone.
>
> In other words the disparity between the comment and the code indicates
> an actual bug, not just a wrong comment.
>
> Eric
>
>>>> Cc: "Eric W. Biederman" <ebiederm@xmission.com>
>>>> Signed-off-by: Robert Shearman <rshearma@brocade.com>
>>>> ---
>>>>    net/mpls/af_mpls.c | 1 -
>>>>    1 file changed, 1 deletion(-)
>>>>
>>>> diff --git a/net/mpls/af_mpls.c b/net/mpls/af_mpls.c
>>>> index 0d6763a..bf3459a 100644
>>>> --- a/net/mpls/af_mpls.c
>>>> +++ b/net/mpls/af_mpls.c
>>>> @@ -199,7 +199,6 @@ static int mpls_forward(struct sk_buff *skb, struct net_device *dev,
>>>>    	skb->protocol = htons(ETH_P_MPLS_UC);
>>>>
>>>>    	if (unlikely(!new_header_size && dec.bos)) {
>>>> -		/* Penultimate hop popping */
>>>>    		if (!mpls_egress(rt, skb, dec))
>>>>    			goto drop;
>>>>    	} else {

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH net-next v2 2/5] mpls: Remove incorrect PHP comment
  2015-03-24 15:18           ` Robert Shearman
@ 2015-03-24 18:43             ` Vivek Venkatraman
  0 siblings, 0 replies; 68+ messages in thread
From: Vivek Venkatraman @ 2015-03-24 18:43 UTC (permalink / raw)
  To: Robert Shearman; +Cc: Eric W. Biederman, davem, netdev

On Tue, Mar 24, 2015 at 8:18 AM, Robert Shearman <rshearma@brocade.com> wrote:
> On 23/03/15 18:16, Eric W. Biederman wrote:
>>
>> Robert Shearman <rshearma@brocade.com> writes:
>>
>>> On 22/03/15 19:12, Eric W. Biederman wrote:
>>>>
>>>> Robert Shearman <rshearma@brocade.com> writes:
>>>>
>>>>> Popping the last label on the stack does not necessarily imply
>>>>> performing penultimate hop popping. There is no reason why this
>>>>> couldn't be the last hop in the network, so remove the comment.
>>>>
>>>>
>>>> So this change I will disagree with.
>>>>
>>>> What the code implements is Penultimate hop popping.  Even if you send
>>>> the packets over loopback that is what the code is doing.
>>>
>>>
>>> No, RFC3031 s3.16 (https://tools.ietf.org/html/rfc3031#page-18) talks in
>>> terms
>>> of LSRs (label switch routers), not passes through the forwarding
>>> code.
>>
>>
>> In very simple terms the code always removes the labels and forwards the
>> code.  That is by definition penultimate hop popping.  That is all that
>> is implmeneted in the code today.  And that is what the comment is
>> noting.
>>

Yes, this is what the code implements today. I guess the point that
Rob might be making
is that this is not necessarily PHP. The terminus of the LSP (ultimate
hop) need not always
signal an implicit-null or an explicit-null label upstream. In such a
case, it is the ultimate hop
that would see a labeled packet and would pop the label.

What the code implements today can probably be better called as "pop
and forward" - the
label directly identifies the nexthop to forward to. There are other
paradigms needed as Rob
describes below.

>
> I think we might be talking at cross purposes here so I'd like to back up a
> little bit and make sure that we are using the same terminology and are
> talking about the same problem.
>
> In terms of terminology this is what I mean -
>
> PHP means popping the label in the penultimate hop router in the *signalled*
> LSP. Note: There may be more hops beyond this LSP - either MPLS using an
> inner label or payload - such as IP.
>
> UHP means popping the label in the ultimate hop router in the signalled LSP.
> Again there may be further hops that are outside of the LSP.
>
> Orthogonal to this is the relation of the FEC to the UH router.
>
> a) All traffic for this FEC needs to be sent to another router (which is
> beyond this LSP).
> b) The FEC requires the UH router to do a payload lookup in order to
> identify where the payload should go. This can be a label lookup on the next
> label in (e.g. where the outer label is a TE tunnel over which further LSPs
> have been setup). Or it can be a payload protocol lookup e.g. IP. There are
> a couple of sub-cases here - the common case is where the FEC is for a
> prefix that is directly connected to the UH - in which case the lookup we
> need is conceptually a neighbour lookup. Another case is where the FEC
> corresponds to a summary prefix that this router has advertised to its peer
> - in which case the lookup is a routing table lookup.
> c) The FEC is destined for the UH router and needs to be "received" by it.
> Typically this case can be handled in exactly the same way as B above but
> mention it here for completeness.
>
> Does this terminology work for you?
>
> If it does then I hope you will also agree that PHP is really aimed at
> decreasing the work for the UH router in the cases where the FEC corresponds
> to B above. (And it also allows C to be handled using existing e.g. IP path
> mechanisms). If we want to do UHP then we'd require the UH router to first
> lookup the outer label (which would be exp-null) and then an additional
> protocol lookup.
>
> But there are a couple of cases where you either can't or may not want to do
> PHP.
>
> Firstly with an application like MPLS VPN the UH router is the PE and it
> needs the label in order to identify which routing table to do the (IP)
> payload lookup in.
>

I can relate to the terminology above. The one clarification I'll add
to the comment
on MPLS VPN is that it is the VPN label that will identify the routing table in
which to do the IP/payload lookup. The PH cannot pop this label as it doesn't
even know about it. The same would be the case for PWE - the PW label needs
to reach the UH for it to be able to do the forwarding. The PH may pop the LSP
label (on which the L3VPN or PW is setup) or it may not - based on what the
UH signaled it to do.

Of course, the comment is one thing - the implementation would be enhancements
to what is already available. IMO, one key aspect of this would be the
ability to
specify a label operation such as "pop" or "swap" - or even "no-op" which may
have a use case when MPLS dataplane is used to achieve Segment Routing.

For example,
ip -f mpls route add 200 pop
ip -f mpls route add 1400 nexthop via inet 192.168.12.1 dev eth1

would handle a label stack {200, 1400(S)} on the UH, pop the LSP label 200 and
(implicitly) pop the VPN label 1400 and forward to nexthop specified.

For example,
ip -f mpls route add 200 pop
ip -f mpls route add 1600 pop dev lo20

could handle a similar L3VPN scenario that requires a payload (IP) lookup. The
VRF (table to use) could be identified by rules associated with
'lo20'. The incoming
interface information is lost, but in this case, the most relevant
information is the
VRF which is determined by 'lo20' (which is pointed to by the VPN label 1600).

> Secondly, in case A, PHP doesn't offer any benefit. The PH is already able
> to use the incoming label to determine how to forward without needing a
> further lookup. It's a choice, but there are certainly stacks out there that
> will choose to advertise a real label for the FEC so that we do UHP here.
> (I'd actually expect that to be the norm).
>
> Does this description of behaviours work for you?
>
> If we are still together at this point then my last question is why you say
> that "remov[ing] a label and forward[ing] the [packet]" is "by definition
> penultimate-hop popping"?
>
> I'm actually more interested in getting aligned on this than the comment
> itself. I also have some comments on what you say below about local delivery
> but I suspect it will all make more sense once we sort this out.
>
> Thanks,
> Rob
>
>>>> This is relevant because I think the code may actually be wrong in the
>>>> local reception case.  By preforming penultimate hop popping and
>>>> receving the code on loopback I think this code allows bypassing
>>>> iptables rules that apply to incoming ip packets.  Certainly there is a
>>>> loss of information as to which hardware interface the packet came in on
>>>> that it may be desirable to correct.
>>>
>>>
>>> Indeed, but network operators may well want to apply different rules to
>>> traffic
>>> coming in as IP versus traffic coming in as MPLS.
>>>
>>> This may well merit a comment of its own, but this isn't directly
>>> relevant to
>>> the comment I'm removing.
>>
>>
>> My point is and what is directly relevant is the case of local delivery
>> is a hack.  A hack that a pretty strong case can be made that it does
>> the wrong thing and something we probably should fix before the code
>> makes it to Linus for 4.1 so the bug does not get cast in stone.
>>
>> In other words the disparity between the comment and the code indicates
>> an actual bug, not just a wrong comment.
>>
>> Eric
>>
>>>>> Cc: "Eric W. Biederman" <ebiederm@xmission.com>
>>>>> Signed-off-by: Robert Shearman <rshearma@brocade.com>
>>>>> ---
>>>>>    net/mpls/af_mpls.c | 1 -
>>>>>    1 file changed, 1 deletion(-)
>>>>>
>>>>> diff --git a/net/mpls/af_mpls.c b/net/mpls/af_mpls.c
>>>>> index 0d6763a..bf3459a 100644
>>>>> --- a/net/mpls/af_mpls.c
>>>>> +++ b/net/mpls/af_mpls.c
>>>>> @@ -199,7 +199,6 @@ static int mpls_forward(struct sk_buff *skb, struct
>>>>> net_device *dev,
>>>>>         skb->protocol = htons(ETH_P_MPLS_UC);
>>>>>
>>>>>         if (unlikely(!new_header_size && dec.bos)) {
>>>>> -               /* Penultimate hop popping */
>>>>>                 if (!mpls_egress(rt, skb, dec))
>>>>>                         goto drop;
>>>>>         } else {
>
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 68+ messages in thread

* [PATCH net-next v3 0/4] mpls: Behaviour-changing improvements
  2015-03-20 15:42 ` [PATCH net-next v2 0/5] mpls: Behaviour-changing improvements Robert Shearman
                     ` (4 preceding siblings ...)
  2015-03-20 15:42   ` [PATCH net-next v2 5/5] mpls: Allow payload type to be associated with label routes Robert Shearman
@ 2015-03-30 18:15   ` Robert Shearman
  2015-03-30 18:15     ` [PATCH net-next v3 1/4] mpls: Use definition for reserved label checks Robert Shearman
                       ` (6 more replies)
  5 siblings, 7 replies; 68+ messages in thread
From: Robert Shearman @ 2015-03-30 18:15 UTC (permalink / raw)
  To: davem; +Cc: netdev, Robert Shearman

This series consists of several small changes to make it easier to
understand the code, along with security and RFC-compliance
changes. These are important to consider before userspace begins
relying on the previous behaviour.

V2:
  - Dropped PHP comment patch to avoid holding up the rest of the
    changes due to quibbling on nomenclature.
  - Corrected reference to RFC 3031 in commit message of patch
    2. Added reference to RFC 3031 s4.1.5 for PHP behaviour.
  - s/forwarding/input/ in patch 3.
  - Made MPT_IPV4 and MPT_IPV6 equal to 4 and 6 respectively in patch
    4, eliminating a switch on the version number as suggested by
    review comments. Added back references to RFCs, but moved them to
    mpls_payload_type enum declaration.
V1:
  - Updated to reference the correct RFC in the first patch.

Robert Shearman (4):
  mpls: Use definition for reserved label checks
  mpls: Differentiate implicit-null and unlabeled neighbours
  mpls: Per-device enabling of packet input
  mpls: Allow payload type to be associated with label routes

 Documentation/networking/mpls-sysctl.txt |   9 ++
 include/linux/netdevice.h                |   4 +
 net/mpls/af_mpls.c                       | 226 +++++++++++++++++++++++++------
 net/mpls/internal.h                      |   7 +
 4 files changed, 203 insertions(+), 43 deletions(-)

-- 
2.1.4

^ permalink raw reply	[flat|nested] 68+ messages in thread

* [PATCH net-next v3 1/4] mpls: Use definition for reserved label checks
  2015-03-30 18:15   ` [PATCH net-next v3 0/4] mpls: Behaviour-changing improvements Robert Shearman
@ 2015-03-30 18:15     ` Robert Shearman
  2015-03-30 18:15     ` [PATCH net-next v3 2/4] mpls: Differentiate implicit-null and unlabeled neighbours Robert Shearman
                       ` (5 subsequent siblings)
  6 siblings, 0 replies; 68+ messages in thread
From: Robert Shearman @ 2015-03-30 18:15 UTC (permalink / raw)
  To: davem; +Cc: netdev, Robert Shearman

In multiple locations there are checks for whether the label in hand
is a reserved label or not using the arbritray value of 16. Factor
this out into a #define for better maintainability and for
documentation.

Acked-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Robert Shearman <rshearma@brocade.com>
---
 net/mpls/af_mpls.c  | 20 ++++++++++----------
 net/mpls/internal.h |  1 +
 2 files changed, 11 insertions(+), 10 deletions(-)

diff --git a/net/mpls/af_mpls.c b/net/mpls/af_mpls.c
index db8a2ea6d4de..0d6763a895d6 100644
--- a/net/mpls/af_mpls.c
+++ b/net/mpls/af_mpls.c
@@ -276,7 +276,7 @@ static void mpls_notify_route(struct net *net, unsigned index,
 	struct mpls_route *rt = new ? new : old;
 	unsigned nlm_flags = (old && new) ? NLM_F_REPLACE : 0;
 	/* Ignore reserved labels for now */
-	if (rt && (index >= 16))
+	if (rt && (index >= LABEL_FIRST_UNRESERVED))
 		rtmsg_lfib(event, index, rt, nlh, net, portid, nlm_flags);
 }
 
@@ -310,7 +310,7 @@ static unsigned find_free_label(struct net *net)
 
 	platform_label = rtnl_dereference(net->mpls.platform_label);
 	platform_labels = net->mpls.platform_labels;
-	for (index = 16; index < platform_labels; index++) {
+	for (index = LABEL_FIRST_UNRESERVED; index < platform_labels; index++) {
 		if (!rtnl_dereference(platform_label[index]))
 			return index;
 	}
@@ -335,8 +335,8 @@ static int mpls_route_add(struct mpls_route_config *cfg)
 		index = find_free_label(net);
 	}
 
-	/* The first 16 labels are reserved, and may not be set */
-	if (index < 16)
+	/* Reserved labels may not be set */
+	if (index < LABEL_FIRST_UNRESERVED)
 		goto errout;
 
 	/* The full 20 bit range may not be supported. */
@@ -413,8 +413,8 @@ static int mpls_route_del(struct mpls_route_config *cfg)
 
 	index = cfg->rc_label;
 
-	/* The first 16 labels are reserved, and may not be removed */
-	if (index < 16)
+	/* Reserved labels may not be removed */
+	if (index < LABEL_FIRST_UNRESERVED)
 		goto errout;
 
 	/* The full 20 bit range may not be supported */
@@ -610,8 +610,8 @@ static int rtm_to_route_config(struct sk_buff *skb,  struct nlmsghdr *nlh,
 					   &cfg->rc_label))
 				goto errout;
 
-			/* The first 16 labels are reserved, and may not be set */
-			if (cfg->rc_label < 16)
+			/* Reserved labels may not be set */
+			if (cfg->rc_label < LABEL_FIRST_UNRESERVED)
 				goto errout;
 
 			break;
@@ -736,8 +736,8 @@ static int mpls_dump_routes(struct sk_buff *skb, struct netlink_callback *cb)
 	ASSERT_RTNL();
 
 	index = cb->args[0];
-	if (index < 16)
-		index = 16;
+	if (index < LABEL_FIRST_UNRESERVED)
+		index = LABEL_FIRST_UNRESERVED;
 
 	platform_label = rtnl_dereference(net->mpls.platform_label);
 	platform_labels = net->mpls.platform_labels;
diff --git a/net/mpls/internal.h b/net/mpls/internal.h
index fb6de92052c4..5732283ee1b9 100644
--- a/net/mpls/internal.h
+++ b/net/mpls/internal.h
@@ -9,6 +9,7 @@
 #define LABEL_GAL			13 /* RFC5586 */
 #define LABEL_OAM_ALERT			14 /* RFC3429 */
 #define LABEL_EXTENSION			15 /* RFC7274 */
+#define LABEL_FIRST_UNRESERVED		16 /* RFC3032 */
 
 
 struct mpls_shim_hdr {
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH net-next v3 2/4] mpls: Differentiate implicit-null and unlabeled neighbours
  2015-03-30 18:15   ` [PATCH net-next v3 0/4] mpls: Behaviour-changing improvements Robert Shearman
  2015-03-30 18:15     ` [PATCH net-next v3 1/4] mpls: Use definition for reserved label checks Robert Shearman
@ 2015-03-30 18:15     ` Robert Shearman
  2015-04-07 16:56       ` Eric W. Biederman
  2015-03-30 18:15     ` [PATCH net-next v3 3/4] mpls: Per-device enabling of packet input Robert Shearman
                       ` (4 subsequent siblings)
  6 siblings, 1 reply; 68+ messages in thread
From: Robert Shearman @ 2015-03-30 18:15 UTC (permalink / raw)
  To: davem; +Cc: netdev, Robert Shearman, Eric W. Biederman

The control plane can advertise labels for neighbours that don't have
an outgoing label. RFC 3031 s3.22 states that either the remaining
labels should be popped (if the control plane can determine that it's
safe to do so, which in light of MPLS-VPN, RFC 4364, is never the case
now) or that the packet should be discarded.

Therefore, if the peer is unlabeled and the last label wasn't popped
then drop the packet. The peer being unlabeled is signalled by an
empty label stack. However, penultimate hop popping still needs to be
supported (RFC 3031 s4.1.5) where the incoming label is popped and no
labels are put on and the packet can still go out labeled with the
remainder of the stack. This is achieved by the control plane
specifying a label stack consisting of the single special
implicit-null value.

Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Robert Shearman <rshearma@brocade.com>
---
 net/mpls/af_mpls.c | 21 +++++++++++++++++----
 1 file changed, 17 insertions(+), 4 deletions(-)

diff --git a/net/mpls/af_mpls.c b/net/mpls/af_mpls.c
index 0d6763a895d6..7f5f30d29f73 100644
--- a/net/mpls/af_mpls.c
+++ b/net/mpls/af_mpls.c
@@ -28,7 +28,8 @@ struct mpls_route { /* next hop label forwarding entry */
 	struct rcu_head		rt_rcu;
 	u32			rt_label[MAX_NEW_LABELS];
 	u8			rt_protocol; /* routing protocol that set this entry */
-	u8			rt_labels;
+	u8                      rt_unlabeled : 1;
+	u8			rt_labels : 7;
 	u8			rt_via_alen;
 	u8			rt_via_table;
 	u8			rt_via[0];
@@ -202,6 +203,11 @@ static int mpls_forward(struct sk_buff *skb, struct net_device *dev,
 		/* Penultimate hop popping */
 		if (!mpls_egress(rt, skb, dec))
 			goto drop;
+	} else if (rt->rt_unlabeled) {
+		/* Labeled traffic destined to unlabeled peer should
+		 * be discarded
+		 */
+		goto drop;
 	} else {
 		bool bos;
 		int i;
@@ -386,9 +392,16 @@ static int mpls_route_add(struct mpls_route_config *cfg)
 	if (!rt)
 		goto errout;
 
-	rt->rt_labels = cfg->rc_output_labels;
-	for (i = 0; i < rt->rt_labels; i++)
-		rt->rt_label[i] = cfg->rc_output_label[i];
+	if (cfg->rc_output_labels == 1 &&
+	    cfg->rc_output_label[0] == LABEL_IMPLICIT_NULL) {
+		rt->rt_labels = 0;
+	} else {
+		rt->rt_labels = cfg->rc_output_labels;
+		for (i = 0; i < rt->rt_labels; i++)
+			rt->rt_label[i] = cfg->rc_output_label[i];
+		if (!rt->rt_labels)
+			rt->rt_unlabeled = true;
+	}
 	rt->rt_protocol = cfg->rc_protocol;
 	RCU_INIT_POINTER(rt->rt_dev, dev);
 	rt->rt_via_table = cfg->rc_via_table;
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH net-next v3 3/4] mpls: Per-device enabling of packet input
  2015-03-30 18:15   ` [PATCH net-next v3 0/4] mpls: Behaviour-changing improvements Robert Shearman
  2015-03-30 18:15     ` [PATCH net-next v3 1/4] mpls: Use definition for reserved label checks Robert Shearman
  2015-03-30 18:15     ` [PATCH net-next v3 2/4] mpls: Differentiate implicit-null and unlabeled neighbours Robert Shearman
@ 2015-03-30 18:15     ` Robert Shearman
  2015-04-07 17:02       ` Eric W. Biederman
  2015-03-30 18:15     ` [PATCH net-next v3 4/4] mpls: Allow payload type to be associated with label routes Robert Shearman
                       ` (3 subsequent siblings)
  6 siblings, 1 reply; 68+ messages in thread
From: Robert Shearman @ 2015-03-30 18:15 UTC (permalink / raw)
  To: davem; +Cc: netdev, Robert Shearman, Eric W. Biederman

An MPLS network is a single trust domain where the edges must be in
control of what labels make their way into the core. The simplest way
of ensuring for the edge device to always impose the labels, and not
allow forward labeled traffic from untrusted neighbours. This is
achieved by allowing a per-device configuration of whether MPLS
traffic input from that interface should be processed or not.

To be secure by default, MPLS is now intially disabled on all
interfaces (except the loopback) until explicitly enabled and no
global option is provided to change the default. Whilst this differs
from other protocols (e.g. IPv6), network operators are used to
explicitly enabling MPLS forwarding on interfaces, and with the number
of links to the MPLS core typically fairly low this doesn't present
too much of a burden on operators.

Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Robert Shearman <rshearma@brocade.com>
---
 Documentation/networking/mpls-sysctl.txt |   9 +++
 include/linux/netdevice.h                |   4 ++
 net/mpls/af_mpls.c                       | 115 ++++++++++++++++++++++++++++++-
 net/mpls/internal.h                      |   6 ++
 4 files changed, 133 insertions(+), 1 deletion(-)

diff --git a/Documentation/networking/mpls-sysctl.txt b/Documentation/networking/mpls-sysctl.txt
index 639ddf0ece9b..9ed15f86c17c 100644
--- a/Documentation/networking/mpls-sysctl.txt
+++ b/Documentation/networking/mpls-sysctl.txt
@@ -18,3 +18,12 @@ platform_labels - INTEGER
 
 	Possible values: 0 - 1048575
 	Default: 0
+
+conf/<interface>/input - BOOL
+	Control whether packets can be input on this interface.
+
+	If disabled, packets will be discarded without further
+	processing.
+
+	0 - disabled (default)
+	not 0 - enabled
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 76951c5fbedf..ee4ca06375c8 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -60,6 +60,7 @@ struct phy_device;
 struct wireless_dev;
 /* 802.15.4 specific */
 struct wpan_dev;
+struct mpls_dev;
 
 void netdev_set_default_ethtool_ops(struct net_device *dev,
 				    const struct ethtool_ops *ops);
@@ -1615,6 +1616,9 @@ struct net_device {
 	void			*ax25_ptr;
 	struct wireless_dev	*ieee80211_ptr;
 	struct wpan_dev		*ieee802154_ptr;
+#if IS_ENABLED(CONFIG_MPLS_ROUTING)
+	struct mpls_dev __rcu	*mpls_ptr;
+#endif
 
 /*
  * Cache lines mostly used on receive path (including eth_type_trans())
diff --git a/net/mpls/af_mpls.c b/net/mpls/af_mpls.c
index 7f5f30d29f73..0b0420bf110d 100644
--- a/net/mpls/af_mpls.c
+++ b/net/mpls/af_mpls.c
@@ -54,6 +54,11 @@ static struct mpls_route *mpls_route_input_rcu(struct net *net, unsigned index)
 	return rt;
 }
 
+static inline struct mpls_dev *mpls_dev_get(const struct net_device *dev)
+{
+	return rcu_dereference_rtnl(dev->mpls_ptr);
+}
+
 static bool mpls_output_possible(const struct net_device *dev)
 {
 	return dev && (dev->flags & IFF_UP) && netif_carrier_ok(dev);
@@ -137,6 +142,7 @@ static int mpls_forward(struct sk_buff *skb, struct net_device *dev,
 	struct mpls_route *rt;
 	struct mpls_entry_decoded dec;
 	struct net_device *out_dev;
+	struct mpls_dev *mdev;
 	unsigned int hh_len;
 	unsigned int new_header_size;
 	unsigned int mtu;
@@ -144,6 +150,10 @@ static int mpls_forward(struct sk_buff *skb, struct net_device *dev,
 
 	/* Careful this entire function runs inside of an rcu critical section */
 
+	mdev = mpls_dev_get(dev);
+	if (!mdev || !mdev->input_enabled)
+		goto drop;
+
 	if (skb->pkt_type != PACKET_HOST)
 		goto drop;
 
@@ -441,10 +451,96 @@ errout:
 	return err;
 }
 
+#define MPLS_PERDEV_SYSCTL_OFFSET(field)	\
+	(&((struct mpls_dev *)0)->field)
+
+static const struct ctl_table mpls_dev_table[] = {
+	{
+		.procname	= "input",
+		.maxlen		= sizeof(int),
+		.mode		= 0644,
+		.proc_handler	= proc_dointvec,
+		.data		= MPLS_PERDEV_SYSCTL_OFFSET(input_enabled),
+	},
+	{ }
+};
+
+static int mpls_dev_sysctl_register(struct net_device *dev,
+				    struct mpls_dev *mdev)
+{
+	char path[sizeof("net/mpls/conf/") + IFNAMSIZ];
+	struct ctl_table *table;
+	int i;
+
+	table = kmemdup(&mpls_dev_table, sizeof(mpls_dev_table), GFP_KERNEL);
+	if (!table)
+		goto out;
+
+	/* Table data contains only offsets relative to the base of
+	 * the mdev at this point, so make them absolute.
+	 */
+	for (i = 0; i < ARRAY_SIZE(mpls_dev_table); i++)
+		table[i].data = (char *)mdev + (uintptr_t)table[i].data;
+
+	snprintf(path, sizeof(path), "net/mpls/conf/%s", dev->name);
+
+	mdev->sysctl = register_net_sysctl(dev_net(dev), path, table);
+	if (!mdev->sysctl)
+		goto free;
+
+	return 0;
+
+free:
+	kfree(table);
+out:
+	return -ENOBUFS;
+}
+
+static void mpls_dev_sysctl_unregister(struct mpls_dev *mdev)
+{
+	struct ctl_table *table;
+
+	table = mdev->sysctl->ctl_table_arg;
+	unregister_net_sysctl_table(mdev->sysctl);
+	kfree(table);
+}
+
+static struct mpls_dev *mpls_add_dev(struct net_device *dev)
+{
+	struct mpls_dev *mdev;
+	int err = -ENOMEM;
+
+	ASSERT_RTNL();
+
+	mdev = kzalloc(sizeof(*mdev), GFP_KERNEL);
+	if (!mdev)
+		return ERR_PTR(err);
+
+	/* Enable MPLS by default on loopback devices, since this
+	 * doesn't represent a security boundary and is required for the
+	 * lookup of inner labels for LSPs terminating on this router.
+	 */
+	if (dev->flags & IFF_LOOPBACK)
+		mdev->input_enabled = 1;
+
+	err = mpls_dev_sysctl_register(dev, mdev);
+	if (err)
+		goto free;
+
+	rcu_assign_pointer(dev->mpls_ptr, mdev);
+
+	return mdev;
+
+free:
+	kfree(mdev);
+	return ERR_PTR(err);
+}
+
 static void mpls_ifdown(struct net_device *dev)
 {
 	struct mpls_route __rcu **platform_label;
 	struct net *net = dev_net(dev);
+	struct mpls_dev *mdev;
 	unsigned index;
 
 	platform_label = rtnl_dereference(net->mpls.platform_label);
@@ -456,14 +552,31 @@ static void mpls_ifdown(struct net_device *dev)
 			continue;
 		rt->rt_dev = NULL;
 	}
+
+	mdev = mpls_dev_get(dev);
+	if (!mdev)
+		return;
+
+	mpls_dev_sysctl_unregister(mdev);
+
+	RCU_INIT_POINTER(dev->mpls_ptr, NULL);
+
+	kfree(mdev);
 }
 
 static int mpls_dev_notify(struct notifier_block *this, unsigned long event,
 			   void *ptr)
 {
 	struct net_device *dev = netdev_notifier_info_to_dev(ptr);
+	struct mpls_dev *mdev;
 
 	switch(event) {
+	case NETDEV_REGISTER:
+		mdev = mpls_add_dev(dev);
+		if (IS_ERR(mdev))
+			return notifier_from_errno(PTR_ERR(mdev));
+		break;
+
 	case NETDEV_UNREGISTER:
 		mpls_ifdown(dev);
 		break;
@@ -925,7 +1038,7 @@ static int mpls_platform_labels(struct ctl_table *table, int write,
 	return ret;
 }
 
-static struct ctl_table mpls_table[] = {
+static const struct ctl_table mpls_table[] = {
 	{
 		.procname	= "platform_labels",
 		.data		= NULL,
diff --git a/net/mpls/internal.h b/net/mpls/internal.h
index 5732283ee1b9..d0aad5e9a2c9 100644
--- a/net/mpls/internal.h
+++ b/net/mpls/internal.h
@@ -23,6 +23,12 @@ struct mpls_entry_decoded {
 	u8 bos;
 };
 
+struct mpls_dev {
+	int			input_enabled;
+
+	struct ctl_table_header *sysctl;
+};
+
 struct sk_buff;
 
 static inline struct mpls_shim_hdr *mpls_hdr(const struct sk_buff *skb)
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH net-next v3 4/4] mpls: Allow payload type to be associated with label routes
  2015-03-30 18:15   ` [PATCH net-next v3 0/4] mpls: Behaviour-changing improvements Robert Shearman
                       ` (2 preceding siblings ...)
  2015-03-30 18:15     ` [PATCH net-next v3 3/4] mpls: Per-device enabling of packet input Robert Shearman
@ 2015-03-30 18:15     ` Robert Shearman
  2015-04-07 17:19       ` Eric W. Biederman
  2015-04-01 19:30     ` [PATCH net-next v3 0/4] mpls: Behaviour-changing improvements David Miller
                       ` (2 subsequent siblings)
  6 siblings, 1 reply; 68+ messages in thread
From: Robert Shearman @ 2015-03-30 18:15 UTC (permalink / raw)
  To: davem; +Cc: netdev, Robert Shearman, Eric W. Biederman

RFC 4182 s2 states that if an IPv4 Explicit NULL label is the only
label on the stack, then after popping the resulting packet must be
treated as a IPv4 packet and forwarded based on the IPv4 header. The
same is true for IPv6 Explicit NULL with an IPv6 packet following.

Therefore, when installing the IPv4/IPv6 Explicit NULL label routes,
add an attribute that specifies the expected payload type for use at
forwarding time for determining the type of the encapsulated packet
instead of inspecting the first nibble of the packet.

Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Robert Shearman <rshearma@brocade.com>
---
 net/mpls/af_mpls.c | 72 ++++++++++++++++++++++++++++++++----------------------
 1 file changed, 43 insertions(+), 29 deletions(-)

diff --git a/net/mpls/af_mpls.c b/net/mpls/af_mpls.c
index 0b0420bf110d..e9ce5799449d 100644
--- a/net/mpls/af_mpls.c
+++ b/net/mpls/af_mpls.c
@@ -23,13 +23,25 @@
 /* This maximum ha length copied from the definition of struct neighbour */
 #define MAX_VIA_ALEN (ALIGN(MAX_ADDR_LEN, sizeof(unsigned long)))
 
+enum mpls_payload_type {
+	MPT_UNSPEC, /* IPv4 or IPv6 */
+	MPT_IPV4 = 4,
+	MPT_IPV6 = 6,
+
+	/* Other types not implemented:
+	 *  - Pseudo-wire with or without control word (RFC4385)
+	 *  - GAL (RFC5586)
+	 */
+};
+
 struct mpls_route { /* next hop label forwarding entry */
 	struct net_device __rcu *rt_dev;
 	struct rcu_head		rt_rcu;
 	u32			rt_label[MAX_NEW_LABELS];
 	u8			rt_protocol; /* routing protocol that set this entry */
 	u8                      rt_unlabeled : 1;
-	u8			rt_labels : 7;
+	u8                      rt_payload_type : 3;
+	u8			rt_labels : 4;
 	u8			rt_via_alen;
 	u8			rt_via_table;
 	u8			rt_via[0];
@@ -90,16 +102,7 @@ static bool mpls_pkt_too_big(const struct sk_buff *skb, unsigned int mtu)
 static bool mpls_egress(struct mpls_route *rt, struct sk_buff *skb,
 			struct mpls_entry_decoded dec)
 {
-	/* RFC4385 and RFC5586 encode other packets in mpls such that
-	 * they don't conflict with the ip version number, making
-	 * decoding by examining the ip version correct in everything
-	 * except for the strangest cases.
-	 *
-	 * The strange cases if we choose to support them will require
-	 * manual configuration.
-	 */
-	struct iphdr *hdr4;
-	bool success = true;
+	enum mpls_payload_type payload_type;
 
 	/* The IPv4 code below accesses through the IPv4 header
 	 * checksum, which is 12 bytes into the packet.
@@ -114,24 +117,31 @@ static bool mpls_egress(struct mpls_route *rt, struct sk_buff *skb,
 	if (!pskb_may_pull(skb, 12))
 		return false;
 
-	/* Use ip_hdr to find the ip protocol version */
-	hdr4 = ip_hdr(skb);
-	if (hdr4->version == 4) {
+	payload_type = rt->rt_payload_type;
+	if (payload_type == MPT_UNSPEC)
+		payload_type = ip_hdr(skb)->version;
+
+	switch (payload_type) {
+	case MPT_IPV4: {
+		struct iphdr *hdr4 = ip_hdr(skb);
 		skb->protocol = htons(ETH_P_IP);
 		csum_replace2(&hdr4->check,
 			      htons(hdr4->ttl << 8),
 			      htons(dec.ttl << 8));
 		hdr4->ttl = dec.ttl;
+		return true;
 	}
-	else if (hdr4->version == 6) {
+	case MPT_IPV6: {
 		struct ipv6hdr *hdr6 = ipv6_hdr(skb);
 		skb->protocol = htons(ETH_P_IPV6);
 		hdr6->hop_limit = dec.ttl;
+		return true;
+	}
+	case MPT_UNSPEC:
+		break;
 	}
-	else
-		/* version 0 and version 1 are used by pseudo wires */
-		success = false;
-	return success;
+
+	return false;
 }
 
 static int mpls_forward(struct sk_buff *skb, struct net_device *dev,
@@ -254,16 +264,17 @@ static const struct nla_policy rtm_mpls_policy[RTA_MAX+1] = {
 };
 
 struct mpls_route_config {
-	u32		rc_protocol;
-	u32		rc_ifindex;
-	u16		rc_via_table;
-	u16		rc_via_alen;
-	u8		rc_via[MAX_VIA_ALEN];
-	u32		rc_label;
-	u32		rc_output_labels;
-	u32		rc_output_label[MAX_NEW_LABELS];
-	u32		rc_nlflags;
-	struct nl_info	rc_nlinfo;
+	u32			rc_protocol;
+	u32			rc_ifindex;
+	u16			rc_via_table;
+	u16			rc_via_alen;
+	u8			rc_via[MAX_VIA_ALEN];
+	u32			rc_label;
+	u32			rc_output_labels;
+	u32			rc_output_label[MAX_NEW_LABELS];
+	u32			rc_nlflags;
+	enum mpls_payload_type	rc_payload_type;
+	struct nl_info		rc_nlinfo;
 };
 
 static struct mpls_route *mpls_rt_alloc(size_t alen)
@@ -414,6 +425,7 @@ static int mpls_route_add(struct mpls_route_config *cfg)
 	}
 	rt->rt_protocol = cfg->rc_protocol;
 	RCU_INIT_POINTER(rt->rt_dev, dev);
+	rt->rt_payload_type = cfg->rc_payload_type;
 	rt->rt_via_table = cfg->rc_via_table;
 	memcpy(rt->rt_via, cfg->rc_via, cfg->rc_via_alen);
 
@@ -949,6 +961,7 @@ static int resize_platform_label_table(struct net *net, size_t limit)
 			goto nort0;
 		RCU_INIT_POINTER(rt0->rt_dev, lo);
 		rt0->rt_protocol = RTPROT_KERNEL;
+		rt0->rt_payload_type = MPT_IPV4;
 		rt0->rt_via_table = NEIGH_LINK_TABLE;
 		memcpy(rt0->rt_via, lo->dev_addr, lo->addr_len);
 	}
@@ -959,6 +972,7 @@ static int resize_platform_label_table(struct net *net, size_t limit)
 			goto nort2;
 		RCU_INIT_POINTER(rt2->rt_dev, lo);
 		rt2->rt_protocol = RTPROT_KERNEL;
+		rt2->rt_payload_type = MPT_IPV6;
 		rt2->rt_via_table = NEIGH_LINK_TABLE;
 		memcpy(rt2->rt_via, lo->dev_addr, lo->addr_len);
 	}
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 68+ messages in thread

* Re: [PATCH net-next v3 0/4] mpls: Behaviour-changing improvements
  2015-03-30 18:15   ` [PATCH net-next v3 0/4] mpls: Behaviour-changing improvements Robert Shearman
                       ` (3 preceding siblings ...)
  2015-03-30 18:15     ` [PATCH net-next v3 4/4] mpls: Allow payload type to be associated with label routes Robert Shearman
@ 2015-04-01 19:30     ` David Miller
  2015-04-01 21:14       ` Eric W. Biederman
  2015-04-01 23:49       ` Robert Shearman
  2015-04-06 20:02     ` David Miller
  2015-04-14 22:44     ` [PATCH net-next v4 0/6] " Robert Shearman
  6 siblings, 2 replies; 68+ messages in thread
From: David Miller @ 2015-04-01 19:30 UTC (permalink / raw)
  To: rshearma; +Cc: netdev, ebiederm

From: Robert Shearman <rshearma@brocade.com>
Date: Mon, 30 Mar 2015 19:15:52 +0100

> This series consists of several small changes to make it easier to
> understand the code, along with security and RFC-compliance
> changes. These are important to consider before userspace begins
> relying on the previous behaviour.

Robert, you _absolutely_ must give common courtesy to Eric Biederman
and always CC: him explicitly on any proposed changes you want to make
to the new MPLS support.

I'm adding him here.

> V2:
>   - Dropped PHP comment patch to avoid holding up the rest of the
>     changes due to quibbling on nomenclature.
>   - Corrected reference to RFC 3031 in commit message of patch
>     2. Added reference to RFC 3031 s4.1.5 for PHP behaviour.
>   - s/forwarding/input/ in patch 3.
>   - Made MPT_IPV4 and MPT_IPV6 equal to 4 and 6 respectively in patch
>     4, eliminating a switch on the version number as suggested by
>     review comments. Added back references to RFCs, but moved them to
>     mpls_payload_type enum declaration.
> V1:
>   - Updated to reference the correct RFC in the first patch.
> 
> Robert Shearman (4):
>   mpls: Use definition for reserved label checks
>   mpls: Differentiate implicit-null and unlabeled neighbours
>   mpls: Per-device enabling of packet input
>   mpls: Allow payload type to be associated with label routes
> 
>  Documentation/networking/mpls-sysctl.txt |   9 ++
>  include/linux/netdevice.h                |   4 +
>  net/mpls/af_mpls.c                       | 226 +++++++++++++++++++++++++------
>  net/mpls/internal.h                      |   7 +
>  4 files changed, 203 insertions(+), 43 deletions(-)
> 
> -- 
> 2.1.4
> 

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH net-next v3 0/4] mpls: Behaviour-changing improvements
  2015-04-01 19:30     ` [PATCH net-next v3 0/4] mpls: Behaviour-changing improvements David Miller
@ 2015-04-01 21:14       ` Eric W. Biederman
  2015-04-01 23:49       ` Robert Shearman
  1 sibling, 0 replies; 68+ messages in thread
From: Eric W. Biederman @ 2015-04-01 21:14 UTC (permalink / raw)
  To: David Miller; +Cc: rshearma, netdev

David Miller <davem@davemloft.net> writes:

> From: Robert Shearman <rshearma@brocade.com>
> Date: Mon, 30 Mar 2015 19:15:52 +0100
>
>> This series consists of several small changes to make it easier to
>> understand the code, along with security and RFC-compliance
>> changes. These are important to consider before userspace begins
>> relying on the previous behaviour.
>
> Robert, you _absolutely_ must give common courtesy to Eric Biederman
> and always CC: him explicitly on any proposed changes you want to make
> to the new MPLS support.
>
> I'm adding him here.

Thank you.  I have been pretty overloaded with catching up on
a couple of other things.  So I have not had as much time as I would
have liked to look at these changes.

Anyone looking at this today should really look at RFC7511 as it seems
releavant.

Eric


>> V2:
>>   - Dropped PHP comment patch to avoid holding up the rest of the
>>     changes due to quibbling on nomenclature.
>>   - Corrected reference to RFC 3031 in commit message of patch
>>     2. Added reference to RFC 3031 s4.1.5 for PHP behaviour.
>>   - s/forwarding/input/ in patch 3.
>>   - Made MPT_IPV4 and MPT_IPV6 equal to 4 and 6 respectively in patch
>>     4, eliminating a switch on the version number as suggested by
>>     review comments. Added back references to RFCs, but moved them to
>>     mpls_payload_type enum declaration.
>> V1:
>>   - Updated to reference the correct RFC in the first patch.
>> 
>> Robert Shearman (4):
>>   mpls: Use definition for reserved label checks
>>   mpls: Differentiate implicit-null and unlabeled neighbours
>>   mpls: Per-device enabling of packet input
>>   mpls: Allow payload type to be associated with label routes
>> 
>>  Documentation/networking/mpls-sysctl.txt |   9 ++
>>  include/linux/netdevice.h                |   4 +
>>  net/mpls/af_mpls.c                       | 226 +++++++++++++++++++++++++------
>>  net/mpls/internal.h                      |   7 +
>>  4 files changed, 203 insertions(+), 43 deletions(-)
>> 
>> -- 
>> 2.1.4
>> 

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH net-next v3 0/4] mpls: Behaviour-changing improvements
  2015-04-01 19:30     ` [PATCH net-next v3 0/4] mpls: Behaviour-changing improvements David Miller
  2015-04-01 21:14       ` Eric W. Biederman
@ 2015-04-01 23:49       ` Robert Shearman
  1 sibling, 0 replies; 68+ messages in thread
From: Robert Shearman @ 2015-04-01 23:49 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, ebiederm

On 01/04/15 20:30, David Miller wrote:
> From: Robert Shearman <rshearma@brocade.com>
> Date: Mon, 30 Mar 2015 19:15:52 +0100
>
>> This series consists of several small changes to make it easier to
>> understand the code, along with security and RFC-compliance
>> changes. These are important to consider before userspace begins
>> relying on the previous behaviour.
>
> Robert, you _absolutely_ must give common courtesy to Eric Biederman
> and always CC: him explicitly on any proposed changes you want to make
> to the new MPLS support.

Sorry for the oversight of not copying him on the cover letter. FWIW, he 
was copied on the individual patches so the intention was to give Eric 
that courtesy.

>
> I'm adding him here.
>
>> V2:
>>    - Dropped PHP comment patch to avoid holding up the rest of the
>>      changes due to quibbling on nomenclature.
>>    - Corrected reference to RFC 3031 in commit message of patch
>>      2. Added reference to RFC 3031 s4.1.5 for PHP behaviour.
>>    - s/forwarding/input/ in patch 3.
>>    - Made MPT_IPV4 and MPT_IPV6 equal to 4 and 6 respectively in patch
>>      4, eliminating a switch on the version number as suggested by
>>      review comments. Added back references to RFCs, but moved them to
>>      mpls_payload_type enum declaration.
>> V1:
>>    - Updated to reference the correct RFC in the first patch.
>>
>> Robert Shearman (4):
>>    mpls: Use definition for reserved label checks
>>    mpls: Differentiate implicit-null and unlabeled neighbours
>>    mpls: Per-device enabling of packet input
>>    mpls: Allow payload type to be associated with label routes
>>
>>   Documentation/networking/mpls-sysctl.txt |   9 ++
>>   include/linux/netdevice.h                |   4 +
>>   net/mpls/af_mpls.c                       | 226 +++++++++++++++++++++++++------
>>   net/mpls/internal.h                      |   7 +
>>   4 files changed, 203 insertions(+), 43 deletions(-)
>>
>> --
>> 2.1.4
>>

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH net-next v3 0/4] mpls: Behaviour-changing improvements
  2015-03-30 18:15   ` [PATCH net-next v3 0/4] mpls: Behaviour-changing improvements Robert Shearman
                       ` (4 preceding siblings ...)
  2015-04-01 19:30     ` [PATCH net-next v3 0/4] mpls: Behaviour-changing improvements David Miller
@ 2015-04-06 20:02     ` David Miller
  2015-04-14 22:44     ` [PATCH net-next v4 0/6] " Robert Shearman
  6 siblings, 0 replies; 68+ messages in thread
From: David Miller @ 2015-04-06 20:02 UTC (permalink / raw)
  To: rshearma; +Cc: netdev, ebiederm

From: Robert Shearman <rshearma@brocade.com>
Date: Mon, 30 Mar 2015 19:15:52 +0100

> This series consists of several small changes to make it easier to
> understand the code, along with security and RFC-compliance
> changes. These are important to consider before userspace begins
> relying on the previous behaviour.
> 
> V2:
>   - Dropped PHP comment patch to avoid holding up the rest of the
>     changes due to quibbling on nomenclature.
>   - Corrected reference to RFC 3031 in commit message of patch
>     2. Added reference to RFC 3031 s4.1.5 for PHP behaviour.
>   - s/forwarding/input/ in patch 3.
>   - Made MPT_IPV4 and MPT_IPV6 equal to 4 and 6 respectively in patch
>     4, eliminating a switch on the version number as suggested by
>     review comments. Added back references to RFCs, but moved them to
>     mpls_payload_type enum declaration.
> V1:
>   - Updated to reference the correct RFC in the first patch.

Eric, I can't sit on this series much longer.  Please review this
at your next possible moment.

Thanks.

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH net-next v3 2/4] mpls: Differentiate implicit-null and unlabeled neighbours
  2015-03-30 18:15     ` [PATCH net-next v3 2/4] mpls: Differentiate implicit-null and unlabeled neighbours Robert Shearman
@ 2015-04-07 16:56       ` Eric W. Biederman
  2015-04-08 17:08         ` Robert Shearman
  0 siblings, 1 reply; 68+ messages in thread
From: Eric W. Biederman @ 2015-04-07 16:56 UTC (permalink / raw)
  To: Robert Shearman; +Cc: davem, netdev

Robert Shearman <rshearma@brocade.com> writes:

> The control plane can advertise labels for neighbours that don't have
> an outgoing label. RFC 3031 s3.22 states that either the remaining
> labels should be popped (if the control plane can determine that it's
> safe to do so, which in light of MPLS-VPN, RFC 4364, is never the case
> now) or that the packet should be discarded.
>
> Therefore, if the peer is unlabeled and the last label wasn't popped
> then drop the packet. The peer being unlabeled is signalled by an
> empty label stack. However, penultimate hop popping still needs to be
> supported (RFC 3031 s4.1.5) where the incoming label is popped and no
> labels are put on and the packet can still go out labeled with the
> remainder of the stack. This is achieved by the control plane
> specifying a label stack consisting of the single special
> implicit-null value.

I disagree with this approach to limiting what can be in an mpls tunnel.
I agree that it would be nice to limit what is an mpls tunnel.

However I want the code and semantics as clean as we can make them.

So what I suggest is to add something like

RTA_PSEUDOWIRE

That has a integer for a type with values like:

PW_FRAME_RELAY_DLCI	0x0001
PW_ATM_AAL5_SDU		0x0002
PW_ATM_TRANSPARENT_CELL 0x0003
PW_ETHERNET_TAGGED	0x0004
PW_ETHERNET		0x0005
PW_HDLC			0x0006
PW_PPP			0x0007
PW_IP			0x000B

Roughly the values from the psedo wire registry
http://www.iana.org/assignments/pwe3-parameters/pwe3-parameters.xhtml

That won't quite work because psedo wires are a subset of what
can be transported over an MPLS network, and a superset of what
we implement in the kernel.  So we need a different identifier.

In passing I will note that the current implementation defaults to
pseudo wire type 0x000B IP layer2 transport.  Which can carry both ipv4
and ipv6 traffic, as well as a generic associated channel.  So unlike
being a weird except to rules what I have actually implemented is
well enough specified that you can signal it.

So for sake of argument let's call it.

RTA_MPLS_PAYLOAD_TYPE

And have values, something like.

#define MPLS_PL_IPV4		0x4
#define MPLS_PL_IPV6		0x6
#define MPLS_PL_MPLS		0x10
#define MPLS_PL_ETHERNET_TAGGED	0x14
#define MPLS_PL_ETHERNET	0x15
#define MPLS_PL_IP		0x1B


And have the semantics be that if you have foreced the payload type with
the attribute and the packet does not match the specified payload we
drop the packet.  Not having the BOS set for anything except for
MPLS_PL_MPLS would be such an error that would cause the packet to
be dropped, and having BOS set for MPLS_PL_MPLS would be an error.

Where I am defining MPLS_PL_MPLS to be the payload type of a label
that transports mpls traffic and is not expected to end at this node.

Although I am not certain that you care about the case I am describing
being handled by MPLS_PL_MPLS.

We should also refuse to accept labels with the implicit NULL set in the 
RTA_NEWDST attribute.

I have read through a bunch of RFCs and I have not seen your distinction
between implict NULL and unlabled show up anywhere quite the way you are
making it.   Regardless what you are trying to do seems to be a
transference from a signalling protocol to the rtnetlink attributes that
mixes semantics.

The current rtnetlink attribute semantics are clean simple and easy to
understand.  When you see the label RTA_DST you remove it and you apply
the label RTA_NEWDST. 

I find playing games with implicit NULL as opposed to using an
RTA_MPLS_PAYLOAD_TYPE type to be mixing of unconnected things and likely
to lead to maintenance problems in the future.

Your reference to RFC3031 section 3.22 also does not work to justify
this behavior as RFC3031 is about receiving a packet that whose top most
label is not in the label table.

In short I think the packet handling semantics you are after are quite
reasonable but your approach is unnecessarily complicated, and
confusing.

Eric

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH net-next v3 3/4] mpls: Per-device enabling of packet input
  2015-03-30 18:15     ` [PATCH net-next v3 3/4] mpls: Per-device enabling of packet input Robert Shearman
@ 2015-04-07 17:02       ` Eric W. Biederman
  2015-04-08 14:29         ` Robert Shearman
  0 siblings, 1 reply; 68+ messages in thread
From: Eric W. Biederman @ 2015-04-07 17:02 UTC (permalink / raw)
  To: Robert Shearman; +Cc: davem, netdev

Robert Shearman <rshearma@brocade.com> writes:

> An MPLS network is a single trust domain where the edges must be in
> control of what labels make their way into the core. The simplest way
> of ensuring for the edge device to always impose the labels, and not
> allow forward labeled traffic from untrusted neighbours. This is
> achieved by allowing a per-device configuration of whether MPLS
> traffic input from that interface should be processed or not.
>
> To be secure by default, MPLS is now intially disabled on all
> interfaces (except the loopback) until explicitly enabled and no
> global option is provided to change the default. Whilst this differs
> from other protocols (e.g. IPv6), network operators are used to
> explicitly enabling MPLS forwarding on interfaces, and with the number
> of links to the MPLS core typically fairly low this doesn't present
> too much of a burden on operators.

This really could use breaking up into two patches.

1 patch that implements mpls_add_dev,
and a second patch that uses the struct mpls_dev to implement
the input bit.

As it stands we are currently allowing mpls attributes on devices that
we do not support the transport of mpls over.  And simply not being able
to find an mpls_dev would be a faster was to discard packets on those
devices.

Naming the attribute input clears up all of the semantic issues that I
had with the previous version of this patch.

> Cc: "Eric W. Biederman" <ebiederm@xmission.com>
> Signed-off-by: Robert Shearman <rshearma@brocade.com>
> ---
>  Documentation/networking/mpls-sysctl.txt |   9 +++
>  include/linux/netdevice.h                |   4 ++
>  net/mpls/af_mpls.c                       | 115 ++++++++++++++++++++++++++++++-
>  net/mpls/internal.h                      |   6 ++
>  4 files changed, 133 insertions(+), 1 deletion(-)
>
> diff --git a/Documentation/networking/mpls-sysctl.txt b/Documentation/networking/mpls-sysctl.txt
> index 639ddf0ece9b..9ed15f86c17c 100644
> --- a/Documentation/networking/mpls-sysctl.txt
> +++ b/Documentation/networking/mpls-sysctl.txt
> @@ -18,3 +18,12 @@ platform_labels - INTEGER
>  
>  	Possible values: 0 - 1048575
>  	Default: 0
> +
> +conf/<interface>/input - BOOL
> +	Control whether packets can be input on this interface.
> +
> +	If disabled, packets will be discarded without further
> +	processing.
> +
> +	0 - disabled (default)
> +	not 0 - enabled
> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
> index 76951c5fbedf..ee4ca06375c8 100644
> --- a/include/linux/netdevice.h
> +++ b/include/linux/netdevice.h
> @@ -60,6 +60,7 @@ struct phy_device;
>  struct wireless_dev;
>  /* 802.15.4 specific */
>  struct wpan_dev;
> +struct mpls_dev;
>  
>  void netdev_set_default_ethtool_ops(struct net_device *dev,
>  				    const struct ethtool_ops *ops);
> @@ -1615,6 +1616,9 @@ struct net_device {
>  	void			*ax25_ptr;
>  	struct wireless_dev	*ieee80211_ptr;
>  	struct wpan_dev		*ieee802154_ptr;
> +#if IS_ENABLED(CONFIG_MPLS_ROUTING)
> +	struct mpls_dev __rcu	*mpls_ptr;
> +#endif
>  
>  /*
>   * Cache lines mostly used on receive path (including eth_type_trans())
> diff --git a/net/mpls/af_mpls.c b/net/mpls/af_mpls.c
> index 7f5f30d29f73..0b0420bf110d 100644
> --- a/net/mpls/af_mpls.c
> +++ b/net/mpls/af_mpls.c
> @@ -54,6 +54,11 @@ static struct mpls_route *mpls_route_input_rcu(struct net *net, unsigned index)
>  	return rt;
>  }
>  
> +static inline struct mpls_dev *mpls_dev_get(const struct net_device *dev)
> +{
> +	return rcu_dereference_rtnl(dev->mpls_ptr);
> +}
> +
>  static bool mpls_output_possible(const struct net_device *dev)
>  {
>  	return dev && (dev->flags & IFF_UP) && netif_carrier_ok(dev);
> @@ -137,6 +142,7 @@ static int mpls_forward(struct sk_buff *skb, struct net_device *dev,
>  	struct mpls_route *rt;
>  	struct mpls_entry_decoded dec;
>  	struct net_device *out_dev;
> +	struct mpls_dev *mdev;
>  	unsigned int hh_len;
>  	unsigned int new_header_size;
>  	unsigned int mtu;
> @@ -144,6 +150,10 @@ static int mpls_forward(struct sk_buff *skb, struct net_device *dev,
>  
>  	/* Careful this entire function runs inside of an rcu critical section */
>  
> +	mdev = mpls_dev_get(dev);
> +	if (!mdev || !mdev->input_enabled)
> +		goto drop;
> +
>  	if (skb->pkt_type != PACKET_HOST)
>  		goto drop;
>  
> @@ -441,10 +451,96 @@ errout:
>  	return err;
>  }
>  
> +#define MPLS_PERDEV_SYSCTL_OFFSET(field)	\
> +	(&((struct mpls_dev *)0)->field)
> +
> +static const struct ctl_table mpls_dev_table[] = {
> +	{
> +		.procname	= "input",
> +		.maxlen		= sizeof(int),
> +		.mode		= 0644,
> +		.proc_handler	= proc_dointvec,
> +		.data		= MPLS_PERDEV_SYSCTL_OFFSET(input_enabled),
> +	},
> +	{ }
> +};
> +
> +static int mpls_dev_sysctl_register(struct net_device *dev,
> +				    struct mpls_dev *mdev)
> +{
> +	char path[sizeof("net/mpls/conf/") + IFNAMSIZ];
> +	struct ctl_table *table;
> +	int i;
> +
> +	table = kmemdup(&mpls_dev_table, sizeof(mpls_dev_table), GFP_KERNEL);
> +	if (!table)
> +		goto out;
> +
> +	/* Table data contains only offsets relative to the base of
> +	 * the mdev at this point, so make them absolute.
> +	 */
> +	for (i = 0; i < ARRAY_SIZE(mpls_dev_table); i++)
> +		table[i].data = (char *)mdev + (uintptr_t)table[i].data;
> +
> +	snprintf(path, sizeof(path), "net/mpls/conf/%s", dev->name);
> +
> +	mdev->sysctl = register_net_sysctl(dev_net(dev), path, table);
> +	if (!mdev->sysctl)
> +		goto free;
> +
> +	return 0;
> +
> +free:
> +	kfree(table);
> +out:
> +	return -ENOBUFS;
> +}
> +
> +static void mpls_dev_sysctl_unregister(struct mpls_dev *mdev)
> +{
> +	struct ctl_table *table;
> +
> +	table = mdev->sysctl->ctl_table_arg;
> +	unregister_net_sysctl_table(mdev->sysctl);
> +	kfree(table);
> +}
> +
> +static struct mpls_dev *mpls_add_dev(struct net_device *dev)
> +{
> +	struct mpls_dev *mdev;
> +	int err = -ENOMEM;
> +
> +	ASSERT_RTNL();
> +
> +	mdev = kzalloc(sizeof(*mdev), GFP_KERNEL);
> +	if (!mdev)
> +		return ERR_PTR(err);
> +
> +	/* Enable MPLS by default on loopback devices, since this
> +	 * doesn't represent a security boundary and is required for the
> +	 * lookup of inner labels for LSPs terminating on this router.
> +	 */
> +	if (dev->flags & IFF_LOOPBACK)
> +		mdev->input_enabled = 1;
> +
> +	err = mpls_dev_sysctl_register(dev, mdev);
> +	if (err)
> +		goto free;
> +
> +	rcu_assign_pointer(dev->mpls_ptr, mdev);
> +
> +	return mdev;
> +
> +free:
> +	kfree(mdev);
> +	return ERR_PTR(err);
> +}
> +
>  static void mpls_ifdown(struct net_device *dev)
>  {
>  	struct mpls_route __rcu **platform_label;
>  	struct net *net = dev_net(dev);
> +	struct mpls_dev *mdev;
>  	unsigned index;
>  
>  	platform_label = rtnl_dereference(net->mpls.platform_label);
> @@ -456,14 +552,31 @@ static void mpls_ifdown(struct net_device *dev)
>  			continue;
>  		rt->rt_dev = NULL;
>  	}
> +
> +	mdev = mpls_dev_get(dev);
> +	if (!mdev)
> +		return;
> +
> +	mpls_dev_sysctl_unregister(mdev);
> +
> +	RCU_INIT_POINTER(dev->mpls_ptr, NULL);
> +
> +	kfree(mdev);
>  }
>  
>  static int mpls_dev_notify(struct notifier_block *this, unsigned long event,
>  			   void *ptr)
>  {
>  	struct net_device *dev = netdev_notifier_info_to_dev(ptr);
> +	struct mpls_dev *mdev;
>  
>  	switch(event) {
> +	case NETDEV_REGISTER:
> +		mdev = mpls_add_dev(dev);
> +		if (IS_ERR(mdev))
> +			return notifier_from_errno(PTR_ERR(mdev));
> +		break;
> +
>  	case NETDEV_UNREGISTER:
>  		mpls_ifdown(dev);
>  		break;
> @@ -925,7 +1038,7 @@ static int mpls_platform_labels(struct ctl_table *table, int write,
>  	return ret;
>  }
>  
> -static struct ctl_table mpls_table[] = {
> +static const struct ctl_table mpls_table[] = {
>  	{
>  		.procname	= "platform_labels",
>  		.data		= NULL,
> diff --git a/net/mpls/internal.h b/net/mpls/internal.h
> index 5732283ee1b9..d0aad5e9a2c9 100644
> --- a/net/mpls/internal.h
> +++ b/net/mpls/internal.h
> @@ -23,6 +23,12 @@ struct mpls_entry_decoded {
>  	u8 bos;
>  };
>  
> +struct mpls_dev {
> +	int			input_enabled;
> +
> +	struct ctl_table_header *sysctl;
> +};
> +
>  struct sk_buff;
>  
>  static inline struct mpls_shim_hdr *mpls_hdr(const struct sk_buff *skb)

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH net-next v3 4/4] mpls: Allow payload type to be associated with label routes
  2015-03-30 18:15     ` [PATCH net-next v3 4/4] mpls: Allow payload type to be associated with label routes Robert Shearman
@ 2015-04-07 17:19       ` Eric W. Biederman
  2015-04-08 14:03         ` Robert Shearman
  0 siblings, 1 reply; 68+ messages in thread
From: Eric W. Biederman @ 2015-04-07 17:19 UTC (permalink / raw)
  To: Robert Shearman; +Cc: davem, netdev

Robert Shearman <rshearma@brocade.com> writes:

> RFC 4182 s2 states that if an IPv4 Explicit NULL label is the only
> label on the stack, then after popping the resulting packet must be
> treated as a IPv4 packet and forwarded based on the IPv4 header. The
> same is true for IPv6 Explicit NULL with an IPv6 packet following.
>
> Therefore, when installing the IPv4/IPv6 Explicit NULL label routes,
> add an attribute that specifies the expected payload type for use at
> forwarding time for determining the type of the encapsulated packet
> instead of inspecting the first nibble of the packet.

This looks pretty reasonable.

I suspect the multiple returns instead of using a single variable
might generate slightly worse machine but whatever.

Acked-by: "Eric W. Biederman" <ebiederm@xmission.com>


> Cc: "Eric W. Biederman" <ebiederm@xmission.com>
> Signed-off-by: Robert Shearman <rshearma@brocade.com>
> ---
>  net/mpls/af_mpls.c | 72 ++++++++++++++++++++++++++++++++----------------------
>  1 file changed, 43 insertions(+), 29 deletions(-)
>
> diff --git a/net/mpls/af_mpls.c b/net/mpls/af_mpls.c
> index 0b0420bf110d..e9ce5799449d 100644
> --- a/net/mpls/af_mpls.c
> +++ b/net/mpls/af_mpls.c
> @@ -23,13 +23,25 @@
>  /* This maximum ha length copied from the definition of struct neighbour */
>  #define MAX_VIA_ALEN (ALIGN(MAX_ADDR_LEN, sizeof(unsigned long)))
>  
> +enum mpls_payload_type {
> +	MPT_UNSPEC, /* IPv4 or IPv6 */
> +	MPT_IPV4 = 4,
> +	MPT_IPV6 = 6,
> +
> +	/* Other types not implemented:
> +	 *  - Pseudo-wire with or without control word (RFC4385)
> +	 *  - GAL (RFC5586)
> +	 */
> +};
> +
>  struct mpls_route { /* next hop label forwarding entry */
>  	struct net_device __rcu *rt_dev;
>  	struct rcu_head		rt_rcu;
>  	u32			rt_label[MAX_NEW_LABELS];
>  	u8			rt_protocol; /* routing protocol that set this entry */
>  	u8                      rt_unlabeled : 1;
> -	u8			rt_labels : 7;
> +	u8                      rt_payload_type : 3;
> +	u8			rt_labels : 4;
>  	u8			rt_via_alen;
>  	u8			rt_via_table;
>  	u8			rt_via[0];
> @@ -90,16 +102,7 @@ static bool mpls_pkt_too_big(const struct sk_buff *skb, unsigned int mtu)
>  static bool mpls_egress(struct mpls_route *rt, struct sk_buff *skb,
>  			struct mpls_entry_decoded dec)
>  {
> -	/* RFC4385 and RFC5586 encode other packets in mpls such that
> -	 * they don't conflict with the ip version number, making
> -	 * decoding by examining the ip version correct in everything
> -	 * except for the strangest cases.
> -	 *
> -	 * The strange cases if we choose to support them will require
> -	 * manual configuration.
> -	 */
> -	struct iphdr *hdr4;
> -	bool success = true;
> +	enum mpls_payload_type payload_type;
>  
>  	/* The IPv4 code below accesses through the IPv4 header
>  	 * checksum, which is 12 bytes into the packet.
> @@ -114,24 +117,31 @@ static bool mpls_egress(struct mpls_route *rt, struct sk_buff *skb,
>  	if (!pskb_may_pull(skb, 12))
>  		return false;
>  
> -	/* Use ip_hdr to find the ip protocol version */
> -	hdr4 = ip_hdr(skb);
> -	if (hdr4->version == 4) {
> +	payload_type = rt->rt_payload_type;
> +	if (payload_type == MPT_UNSPEC)
> +		payload_type = ip_hdr(skb)->version;
> +
> +	switch (payload_type) {
> +	case MPT_IPV4: {
> +		struct iphdr *hdr4 = ip_hdr(skb);
>  		skb->protocol = htons(ETH_P_IP);
>  		csum_replace2(&hdr4->check,
>  			      htons(hdr4->ttl << 8),
>  			      htons(dec.ttl << 8));
>  		hdr4->ttl = dec.ttl;
> +		return true;
>  	}
> -	else if (hdr4->version == 6) {
> +	case MPT_IPV6: {
>  		struct ipv6hdr *hdr6 = ipv6_hdr(skb);
>  		skb->protocol = htons(ETH_P_IPV6);
>  		hdr6->hop_limit = dec.ttl;
> +		return true;
> +	}
> +	case MPT_UNSPEC:
> +		break;
>  	}
> -	else
> -		/* version 0 and version 1 are used by pseudo wires */
> -		success = false;
> -	return success;
> +
> +	return false;
>  }
>  
>  static int mpls_forward(struct sk_buff *skb, struct net_device *dev,
> @@ -254,16 +264,17 @@ static const struct nla_policy rtm_mpls_policy[RTA_MAX+1] = {
>  };
>  
>  struct mpls_route_config {
> -	u32		rc_protocol;
> -	u32		rc_ifindex;
> -	u16		rc_via_table;
> -	u16		rc_via_alen;
> -	u8		rc_via[MAX_VIA_ALEN];
> -	u32		rc_label;
> -	u32		rc_output_labels;
> -	u32		rc_output_label[MAX_NEW_LABELS];
> -	u32		rc_nlflags;
> -	struct nl_info	rc_nlinfo;
> +	u32			rc_protocol;
> +	u32			rc_ifindex;
> +	u16			rc_via_table;
> +	u16			rc_via_alen;
> +	u8			rc_via[MAX_VIA_ALEN];
> +	u32			rc_label;
> +	u32			rc_output_labels;
> +	u32			rc_output_label[MAX_NEW_LABELS];
> +	u32			rc_nlflags;
> +	enum mpls_payload_type	rc_payload_type;
> +	struct nl_info		rc_nlinfo;
>  };
>  
>  static struct mpls_route *mpls_rt_alloc(size_t alen)
> @@ -414,6 +425,7 @@ static int mpls_route_add(struct mpls_route_config *cfg)
>  	}
>  	rt->rt_protocol = cfg->rc_protocol;
>  	RCU_INIT_POINTER(rt->rt_dev, dev);
> +	rt->rt_payload_type = cfg->rc_payload_type;
>  	rt->rt_via_table = cfg->rc_via_table;
>  	memcpy(rt->rt_via, cfg->rc_via, cfg->rc_via_alen);
>  
> @@ -949,6 +961,7 @@ static int resize_platform_label_table(struct net *net, size_t limit)
>  			goto nort0;
>  		RCU_INIT_POINTER(rt0->rt_dev, lo);
>  		rt0->rt_protocol = RTPROT_KERNEL;
> +		rt0->rt_payload_type = MPT_IPV4;
>  		rt0->rt_via_table = NEIGH_LINK_TABLE;
>  		memcpy(rt0->rt_via, lo->dev_addr, lo->addr_len);
>  	}
> @@ -959,6 +972,7 @@ static int resize_platform_label_table(struct net *net, size_t limit)
>  			goto nort2;
>  		RCU_INIT_POINTER(rt2->rt_dev, lo);
>  		rt2->rt_protocol = RTPROT_KERNEL;
> +		rt2->rt_payload_type = MPT_IPV6;
>  		rt2->rt_via_table = NEIGH_LINK_TABLE;
>  		memcpy(rt2->rt_via, lo->dev_addr, lo->addr_len);
>  	}

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH net-next v3 4/4] mpls: Allow payload type to be associated with label routes
  2015-04-07 17:19       ` Eric W. Biederman
@ 2015-04-08 14:03         ` Robert Shearman
  0 siblings, 0 replies; 68+ messages in thread
From: Robert Shearman @ 2015-04-08 14:03 UTC (permalink / raw)
  To: Eric W. Biederman; +Cc: davem, netdev

On 07/04/15 18:19, Eric W. Biederman wrote:
> Robert Shearman <rshearma@brocade.com> writes:
>
>> RFC 4182 s2 states that if an IPv4 Explicit NULL label is the only
>> label on the stack, then after popping the resulting packet must be
>> treated as a IPv4 packet and forwarded based on the IPv4 header. The
>> same is true for IPv6 Explicit NULL with an IPv6 packet following.
>>
>> Therefore, when installing the IPv4/IPv6 Explicit NULL label routes,
>> add an attribute that specifies the expected payload type for use at
>> forwarding time for determining the type of the encapsulated packet
>> instead of inspecting the first nibble of the packet.
>
> This looks pretty reasonable.
>
> I suspect the multiple returns instead of using a single variable
> might generate slightly worse machine but whatever.

That's a good point - the way the changes are structured now means that 
the removal of the local variable doesn't add anything, so if it's OK 
with you I'll change that.

>
> Acked-by: "Eric W. Biederman" <ebiederm@xmission.com>

Thanks for the review Eric.

Rob

>
>
>> Cc: "Eric W. Biederman" <ebiederm@xmission.com>
>> Signed-off-by: Robert Shearman <rshearma@brocade.com>
>> ---
>>   net/mpls/af_mpls.c | 72 ++++++++++++++++++++++++++++++++----------------------
>>   1 file changed, 43 insertions(+), 29 deletions(-)
>>
>> diff --git a/net/mpls/af_mpls.c b/net/mpls/af_mpls.c
>> index 0b0420bf110d..e9ce5799449d 100644
>> --- a/net/mpls/af_mpls.c
>> +++ b/net/mpls/af_mpls.c
>> @@ -23,13 +23,25 @@
>>   /* This maximum ha length copied from the definition of struct neighbour */
>>   #define MAX_VIA_ALEN (ALIGN(MAX_ADDR_LEN, sizeof(unsigned long)))
>>
>> +enum mpls_payload_type {
>> +	MPT_UNSPEC, /* IPv4 or IPv6 */
>> +	MPT_IPV4 = 4,
>> +	MPT_IPV6 = 6,
>> +
>> +	/* Other types not implemented:
>> +	 *  - Pseudo-wire with or without control word (RFC4385)
>> +	 *  - GAL (RFC5586)
>> +	 */
>> +};
>> +
>>   struct mpls_route { /* next hop label forwarding entry */
>>   	struct net_device __rcu *rt_dev;
>>   	struct rcu_head		rt_rcu;
>>   	u32			rt_label[MAX_NEW_LABELS];
>>   	u8			rt_protocol; /* routing protocol that set this entry */
>>   	u8                      rt_unlabeled : 1;
>> -	u8			rt_labels : 7;
>> +	u8                      rt_payload_type : 3;
>> +	u8			rt_labels : 4;
>>   	u8			rt_via_alen;
>>   	u8			rt_via_table;
>>   	u8			rt_via[0];
>> @@ -90,16 +102,7 @@ static bool mpls_pkt_too_big(const struct sk_buff *skb, unsigned int mtu)
>>   static bool mpls_egress(struct mpls_route *rt, struct sk_buff *skb,
>>   			struct mpls_entry_decoded dec)
>>   {
>> -	/* RFC4385 and RFC5586 encode other packets in mpls such that
>> -	 * they don't conflict with the ip version number, making
>> -	 * decoding by examining the ip version correct in everything
>> -	 * except for the strangest cases.
>> -	 *
>> -	 * The strange cases if we choose to support them will require
>> -	 * manual configuration.
>> -	 */
>> -	struct iphdr *hdr4;
>> -	bool success = true;
>> +	enum mpls_payload_type payload_type;
>>
>>   	/* The IPv4 code below accesses through the IPv4 header
>>   	 * checksum, which is 12 bytes into the packet.
>> @@ -114,24 +117,31 @@ static bool mpls_egress(struct mpls_route *rt, struct sk_buff *skb,
>>   	if (!pskb_may_pull(skb, 12))
>>   		return false;
>>
>> -	/* Use ip_hdr to find the ip protocol version */
>> -	hdr4 = ip_hdr(skb);
>> -	if (hdr4->version == 4) {
>> +	payload_type = rt->rt_payload_type;
>> +	if (payload_type == MPT_UNSPEC)
>> +		payload_type = ip_hdr(skb)->version;
>> +
>> +	switch (payload_type) {
>> +	case MPT_IPV4: {
>> +		struct iphdr *hdr4 = ip_hdr(skb);
>>   		skb->protocol = htons(ETH_P_IP);
>>   		csum_replace2(&hdr4->check,
>>   			      htons(hdr4->ttl << 8),
>>   			      htons(dec.ttl << 8));
>>   		hdr4->ttl = dec.ttl;
>> +		return true;
>>   	}
>> -	else if (hdr4->version == 6) {
>> +	case MPT_IPV6: {
>>   		struct ipv6hdr *hdr6 = ipv6_hdr(skb);
>>   		skb->protocol = htons(ETH_P_IPV6);
>>   		hdr6->hop_limit = dec.ttl;
>> +		return true;
>> +	}
>> +	case MPT_UNSPEC:
>> +		break;
>>   	}
>> -	else
>> -		/* version 0 and version 1 are used by pseudo wires */
>> -		success = false;
>> -	return success;
>> +
>> +	return false;
>>   }
>>
>>   static int mpls_forward(struct sk_buff *skb, struct net_device *dev,
>> @@ -254,16 +264,17 @@ static const struct nla_policy rtm_mpls_policy[RTA_MAX+1] = {
>>   };
>>
>>   struct mpls_route_config {
>> -	u32		rc_protocol;
>> -	u32		rc_ifindex;
>> -	u16		rc_via_table;
>> -	u16		rc_via_alen;
>> -	u8		rc_via[MAX_VIA_ALEN];
>> -	u32		rc_label;
>> -	u32		rc_output_labels;
>> -	u32		rc_output_label[MAX_NEW_LABELS];
>> -	u32		rc_nlflags;
>> -	struct nl_info	rc_nlinfo;
>> +	u32			rc_protocol;
>> +	u32			rc_ifindex;
>> +	u16			rc_via_table;
>> +	u16			rc_via_alen;
>> +	u8			rc_via[MAX_VIA_ALEN];
>> +	u32			rc_label;
>> +	u32			rc_output_labels;
>> +	u32			rc_output_label[MAX_NEW_LABELS];
>> +	u32			rc_nlflags;
>> +	enum mpls_payload_type	rc_payload_type;
>> +	struct nl_info		rc_nlinfo;
>>   };
>>
>>   static struct mpls_route *mpls_rt_alloc(size_t alen)
>> @@ -414,6 +425,7 @@ static int mpls_route_add(struct mpls_route_config *cfg)
>>   	}
>>   	rt->rt_protocol = cfg->rc_protocol;
>>   	RCU_INIT_POINTER(rt->rt_dev, dev);
>> +	rt->rt_payload_type = cfg->rc_payload_type;
>>   	rt->rt_via_table = cfg->rc_via_table;
>>   	memcpy(rt->rt_via, cfg->rc_via, cfg->rc_via_alen);
>>
>> @@ -949,6 +961,7 @@ static int resize_platform_label_table(struct net *net, size_t limit)
>>   			goto nort0;
>>   		RCU_INIT_POINTER(rt0->rt_dev, lo);
>>   		rt0->rt_protocol = RTPROT_KERNEL;
>> +		rt0->rt_payload_type = MPT_IPV4;
>>   		rt0->rt_via_table = NEIGH_LINK_TABLE;
>>   		memcpy(rt0->rt_via, lo->dev_addr, lo->addr_len);
>>   	}
>> @@ -959,6 +972,7 @@ static int resize_platform_label_table(struct net *net, size_t limit)
>>   			goto nort2;
>>   		RCU_INIT_POINTER(rt2->rt_dev, lo);
>>   		rt2->rt_protocol = RTPROT_KERNEL;
>> +		rt2->rt_payload_type = MPT_IPV6;
>>   		rt2->rt_via_table = NEIGH_LINK_TABLE;
>>   		memcpy(rt2->rt_via, lo->dev_addr, lo->addr_len);
>>   	}

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH net-next v3 3/4] mpls: Per-device enabling of packet input
  2015-04-07 17:02       ` Eric W. Biederman
@ 2015-04-08 14:29         ` Robert Shearman
  2015-04-08 14:44           ` Eric W. Biederman
  0 siblings, 1 reply; 68+ messages in thread
From: Robert Shearman @ 2015-04-08 14:29 UTC (permalink / raw)
  To: Eric W. Biederman; +Cc: davem, netdev

On 07/04/15 18:02, Eric W. Biederman wrote:
> Robert Shearman <rshearma@brocade.com> writes:
>
>> An MPLS network is a single trust domain where the edges must be in
>> control of what labels make their way into the core. The simplest way
>> of ensuring for the edge device to always impose the labels, and not
>> allow forward labeled traffic from untrusted neighbours. This is
>> achieved by allowing a per-device configuration of whether MPLS
>> traffic input from that interface should be processed or not.
>>
>> To be secure by default, MPLS is now intially disabled on all
>> interfaces (except the loopback) until explicitly enabled and no
>> global option is provided to change the default. Whilst this differs
>> from other protocols (e.g. IPv6), network operators are used to
>> explicitly enabling MPLS forwarding on interfaces, and with the number
>> of links to the MPLS core typically fairly low this doesn't present
>> too much of a burden on operators.
>
> This really could use breaking up into two patches.
>
> 1 patch that implements mpls_add_dev,
> and a second patch that uses the struct mpls_dev to implement
> the input bit.

Sure, I'll do that.

> As it stands we are currently allowing mpls attributes on devices that
> we do not support the transport of mpls over.  And simply not being able
> to find an mpls_dev would be a faster was to discard packets on those
> devices.

Note that this will change the semantics, since currently we allow MPLS 
packets to be input on device types other than ethernet and loopback, 
whereas with this change they won't by default and won't be able to 
enable it. If that's what you intended and it's desirable then I'll 
proceed with that.

> Naming the attribute input clears up all of the semantic issues that I
> had with the previous version of this patch.

Thanks for confirming that.

Rob

>
>> Cc: "Eric W. Biederman" <ebiederm@xmission.com>
>> Signed-off-by: Robert Shearman <rshearma@brocade.com>
>> ---
>>   Documentation/networking/mpls-sysctl.txt |   9 +++
>>   include/linux/netdevice.h                |   4 ++
>>   net/mpls/af_mpls.c                       | 115 ++++++++++++++++++++++++++++++-
>>   net/mpls/internal.h                      |   6 ++
>>   4 files changed, 133 insertions(+), 1 deletion(-)
>>
>> diff --git a/Documentation/networking/mpls-sysctl.txt b/Documentation/networking/mpls-sysctl.txt
>> index 639ddf0ece9b..9ed15f86c17c 100644
>> --- a/Documentation/networking/mpls-sysctl.txt
>> +++ b/Documentation/networking/mpls-sysctl.txt
>> @@ -18,3 +18,12 @@ platform_labels - INTEGER
>>
>>   	Possible values: 0 - 1048575
>>   	Default: 0
>> +
>> +conf/<interface>/input - BOOL
>> +	Control whether packets can be input on this interface.
>> +
>> +	If disabled, packets will be discarded without further
>> +	processing.
>> +
>> +	0 - disabled (default)
>> +	not 0 - enabled
>> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
>> index 76951c5fbedf..ee4ca06375c8 100644
>> --- a/include/linux/netdevice.h
>> +++ b/include/linux/netdevice.h
>> @@ -60,6 +60,7 @@ struct phy_device;
>>   struct wireless_dev;
>>   /* 802.15.4 specific */
>>   struct wpan_dev;
>> +struct mpls_dev;
>>
>>   void netdev_set_default_ethtool_ops(struct net_device *dev,
>>   				    const struct ethtool_ops *ops);
>> @@ -1615,6 +1616,9 @@ struct net_device {
>>   	void			*ax25_ptr;
>>   	struct wireless_dev	*ieee80211_ptr;
>>   	struct wpan_dev		*ieee802154_ptr;
>> +#if IS_ENABLED(CONFIG_MPLS_ROUTING)
>> +	struct mpls_dev __rcu	*mpls_ptr;
>> +#endif
>>
>>   /*
>>    * Cache lines mostly used on receive path (including eth_type_trans())
>> diff --git a/net/mpls/af_mpls.c b/net/mpls/af_mpls.c
>> index 7f5f30d29f73..0b0420bf110d 100644
>> --- a/net/mpls/af_mpls.c
>> +++ b/net/mpls/af_mpls.c
>> @@ -54,6 +54,11 @@ static struct mpls_route *mpls_route_input_rcu(struct net *net, unsigned index)
>>   	return rt;
>>   }
>>
>> +static inline struct mpls_dev *mpls_dev_get(const struct net_device *dev)
>> +{
>> +	return rcu_dereference_rtnl(dev->mpls_ptr);
>> +}
>> +
>>   static bool mpls_output_possible(const struct net_device *dev)
>>   {
>>   	return dev && (dev->flags & IFF_UP) && netif_carrier_ok(dev);
>> @@ -137,6 +142,7 @@ static int mpls_forward(struct sk_buff *skb, struct net_device *dev,
>>   	struct mpls_route *rt;
>>   	struct mpls_entry_decoded dec;
>>   	struct net_device *out_dev;
>> +	struct mpls_dev *mdev;
>>   	unsigned int hh_len;
>>   	unsigned int new_header_size;
>>   	unsigned int mtu;
>> @@ -144,6 +150,10 @@ static int mpls_forward(struct sk_buff *skb, struct net_device *dev,
>>
>>   	/* Careful this entire function runs inside of an rcu critical section */
>>
>> +	mdev = mpls_dev_get(dev);
>> +	if (!mdev || !mdev->input_enabled)
>> +		goto drop;
>> +
>>   	if (skb->pkt_type != PACKET_HOST)
>>   		goto drop;
>>
>> @@ -441,10 +451,96 @@ errout:
>>   	return err;
>>   }
>>
>> +#define MPLS_PERDEV_SYSCTL_OFFSET(field)	\
>> +	(&((struct mpls_dev *)0)->field)
>> +
>> +static const struct ctl_table mpls_dev_table[] = {
>> +	{
>> +		.procname	= "input",
>> +		.maxlen		= sizeof(int),
>> +		.mode		= 0644,
>> +		.proc_handler	= proc_dointvec,
>> +		.data		= MPLS_PERDEV_SYSCTL_OFFSET(input_enabled),
>> +	},
>> +	{ }
>> +};
>> +
>> +static int mpls_dev_sysctl_register(struct net_device *dev,
>> +				    struct mpls_dev *mdev)
>> +{
>> +	char path[sizeof("net/mpls/conf/") + IFNAMSIZ];
>> +	struct ctl_table *table;
>> +	int i;
>> +
>> +	table = kmemdup(&mpls_dev_table, sizeof(mpls_dev_table), GFP_KERNEL);
>> +	if (!table)
>> +		goto out;
>> +
>> +	/* Table data contains only offsets relative to the base of
>> +	 * the mdev at this point, so make them absolute.
>> +	 */
>> +	for (i = 0; i < ARRAY_SIZE(mpls_dev_table); i++)
>> +		table[i].data = (char *)mdev + (uintptr_t)table[i].data;
>> +
>> +	snprintf(path, sizeof(path), "net/mpls/conf/%s", dev->name);
>> +
>> +	mdev->sysctl = register_net_sysctl(dev_net(dev), path, table);
>> +	if (!mdev->sysctl)
>> +		goto free;
>> +
>> +	return 0;
>> +
>> +free:
>> +	kfree(table);
>> +out:
>> +	return -ENOBUFS;
>> +}
>> +
>> +static void mpls_dev_sysctl_unregister(struct mpls_dev *mdev)
>> +{
>> +	struct ctl_table *table;
>> +
>> +	table = mdev->sysctl->ctl_table_arg;
>> +	unregister_net_sysctl_table(mdev->sysctl);
>> +	kfree(table);
>> +}
>> +
>> +static struct mpls_dev *mpls_add_dev(struct net_device *dev)
>> +{
>> +	struct mpls_dev *mdev;
>> +	int err = -ENOMEM;
>> +
>> +	ASSERT_RTNL();
>> +
>> +	mdev = kzalloc(sizeof(*mdev), GFP_KERNEL);
>> +	if (!mdev)
>> +		return ERR_PTR(err);
>> +
>> +	/* Enable MPLS by default on loopback devices, since this
>> +	 * doesn't represent a security boundary and is required for the
>> +	 * lookup of inner labels for LSPs terminating on this router.
>> +	 */
>> +	if (dev->flags & IFF_LOOPBACK)
>> +		mdev->input_enabled = 1;
>> +
>> +	err = mpls_dev_sysctl_register(dev, mdev);
>> +	if (err)
>> +		goto free;
>> +
>> +	rcu_assign_pointer(dev->mpls_ptr, mdev);
>> +
>> +	return mdev;
>> +
>> +free:
>> +	kfree(mdev);
>> +	return ERR_PTR(err);
>> +}
>> +
>>   static void mpls_ifdown(struct net_device *dev)
>>   {
>>   	struct mpls_route __rcu **platform_label;
>>   	struct net *net = dev_net(dev);
>> +	struct mpls_dev *mdev;
>>   	unsigned index;
>>
>>   	platform_label = rtnl_dereference(net->mpls.platform_label);
>> @@ -456,14 +552,31 @@ static void mpls_ifdown(struct net_device *dev)
>>   			continue;
>>   		rt->rt_dev = NULL;
>>   	}
>> +
>> +	mdev = mpls_dev_get(dev);
>> +	if (!mdev)
>> +		return;
>> +
>> +	mpls_dev_sysctl_unregister(mdev);
>> +
>> +	RCU_INIT_POINTER(dev->mpls_ptr, NULL);
>> +
>> +	kfree(mdev);
>>   }
>>
>>   static int mpls_dev_notify(struct notifier_block *this, unsigned long event,
>>   			   void *ptr)
>>   {
>>   	struct net_device *dev = netdev_notifier_info_to_dev(ptr);
>> +	struct mpls_dev *mdev;
>>
>>   	switch(event) {
>> +	case NETDEV_REGISTER:
>> +		mdev = mpls_add_dev(dev);
>> +		if (IS_ERR(mdev))
>> +			return notifier_from_errno(PTR_ERR(mdev));
>> +		break;
>> +
>>   	case NETDEV_UNREGISTER:
>>   		mpls_ifdown(dev);
>>   		break;
>> @@ -925,7 +1038,7 @@ static int mpls_platform_labels(struct ctl_table *table, int write,
>>   	return ret;
>>   }
>>
>> -static struct ctl_table mpls_table[] = {
>> +static const struct ctl_table mpls_table[] = {
>>   	{
>>   		.procname	= "platform_labels",
>>   		.data		= NULL,
>> diff --git a/net/mpls/internal.h b/net/mpls/internal.h
>> index 5732283ee1b9..d0aad5e9a2c9 100644
>> --- a/net/mpls/internal.h
>> +++ b/net/mpls/internal.h
>> @@ -23,6 +23,12 @@ struct mpls_entry_decoded {
>>   	u8 bos;
>>   };
>>
>> +struct mpls_dev {
>> +	int			input_enabled;
>> +
>> +	struct ctl_table_header *sysctl;
>> +};
>> +
>>   struct sk_buff;
>>
>>   static inline struct mpls_shim_hdr *mpls_hdr(const struct sk_buff *skb)

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH net-next v3 3/4] mpls: Per-device enabling of packet input
  2015-04-08 14:29         ` Robert Shearman
@ 2015-04-08 14:44           ` Eric W. Biederman
  0 siblings, 0 replies; 68+ messages in thread
From: Eric W. Biederman @ 2015-04-08 14:44 UTC (permalink / raw)
  To: Robert Shearman; +Cc: davem, netdev

Robert Shearman <rshearma@brocade.com> writes:

> On 07/04/15 18:02, Eric W. Biederman wrote:
>> Robert Shearman <rshearma@brocade.com> writes:
>>
>>> An MPLS network is a single trust domain where the edges must be in
>>> control of what labels make their way into the core. The simplest way
>>> of ensuring for the edge device to always impose the labels, and not
>>> allow forward labeled traffic from untrusted neighbours. This is
>>> achieved by allowing a per-device configuration of whether MPLS
>>> traffic input from that interface should be processed or not.
>>>
>>> To be secure by default, MPLS is now intially disabled on all
>>> interfaces (except the loopback) until explicitly enabled and no
>>> global option is provided to change the default. Whilst this differs
>>> from other protocols (e.g. IPv6), network operators are used to
>>> explicitly enabling MPLS forwarding on interfaces, and with the number
>>> of links to the MPLS core typically fairly low this doesn't present
>>> too much of a burden on operators.
>>
>> This really could use breaking up into two patches.
>>
>> 1 patch that implements mpls_add_dev,
>> and a second patch that uses the struct mpls_dev to implement
>> the input bit.
>
> Sure, I'll do that.
>
>> As it stands we are currently allowing mpls attributes on devices that
>> we do not support the transport of mpls over.  And simply not being able
>> to find an mpls_dev would be a faster was to discard packets on those
>> devices.
>
> Note that this will change the semantics, since currently we allow MPLS packets
> to be input on device types other than ethernet and loopback, whereas with this
> change they won't by default and won't be able to enable it. If that's what you
> intended and it's desirable then I'll proceed with that.

Yes.  For device types where we haven't figured out how to support MPLS
yet we should just have reception of MPLS packets disabled.

>> Naming the attribute input clears up all of the semantic issues that I
>> had with the previous version of this patch.
>
> Thanks for confirming that.

Eric

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH net-next v3 2/4] mpls: Differentiate implicit-null and unlabeled neighbours
  2015-04-07 16:56       ` Eric W. Biederman
@ 2015-04-08 17:08         ` Robert Shearman
  0 siblings, 0 replies; 68+ messages in thread
From: Robert Shearman @ 2015-04-08 17:08 UTC (permalink / raw)
  To: Eric W. Biederman; +Cc: davem, netdev

On 07/04/15 17:56, Eric W. Biederman wrote:
> Robert Shearman <rshearma@brocade.com> writes:
>
>> The control plane can advertise labels for neighbours that don't have
>> an outgoing label. RFC 3031 s3.22 states that either the remaining
>> labels should be popped (if the control plane can determine that it's
>> safe to do so, which in light of MPLS-VPN, RFC 4364, is never the case
>> now) or that the packet should be discarded.
>>
>> Therefore, if the peer is unlabeled and the last label wasn't popped
>> then drop the packet. The peer being unlabeled is signalled by an
>> empty label stack. However, penultimate hop popping still needs to be
>> supported (RFC 3031 s4.1.5) where the incoming label is popped and no
>> labels are put on and the packet can still go out labeled with the
>> remainder of the stack. This is achieved by the control plane
>> specifying a label stack consisting of the single special
>> implicit-null value.
>
> I disagree with this approach to limiting what can be in an mpls tunnel.
> I agree that it would be nice to limit what is an mpls tunnel.
>
> However I want the code and semantics as clean as we can make them.
>
> So what I suggest is to add something like
>
> RTA_PSEUDOWIRE
>
> That has a integer for a type with values like:
>
> PW_FRAME_RELAY_DLCI	0x0001
> PW_ATM_AAL5_SDU		0x0002
> PW_ATM_TRANSPARENT_CELL 0x0003
> PW_ETHERNET_TAGGED	0x0004
> PW_ETHERNET		0x0005
> PW_HDLC			0x0006
> PW_PPP			0x0007
> PW_IP			0x000B
>
> Roughly the values from the psedo wire registry
> http://www.iana.org/assignments/pwe3-parameters/pwe3-parameters.xhtml
>
> That won't quite work because psedo wires are a subset of what
> can be transported over an MPLS network, and a superset of what
> we implement in the kernel.  So we need a different identifier.
>
> In passing I will note that the current implementation defaults to
> pseudo wire type 0x000B IP layer2 transport.  Which can carry both ipv4
> and ipv6 traffic, as well as a generic associated channel.  So unlike
> being a weird except to rules what I have actually implemented is
> well enough specified that you can signal it.

Note that the G-ACh cannot appear raw over the LSP:

RFC5586 s4:
>    Generalizing the associated control channel mechanism to LSPs and
>    Sections also requires a method to identify that a packet contains an
>    ACH followed by a non-service payload.  This document specifies that
>    a label is used for that purpose and calls this special label the
>    G-ACh Label (GAL).

I'd also like to further note that there are semantic differences 
between a pseudo-wire and a regular LSP, which are analogous to the 
difference between a switch and a router. The most important difference 
is that the TTL is never propagated either on ingress or egress from the 
LSP. There could be more minor differences regarding fragmentation on 
imposition, ICMP generation and the treatment of multicast traffic.

>
> So for sake of argument let's call it.
>
> RTA_MPLS_PAYLOAD_TYPE
>
> And have values, something like.
>
> #define MPLS_PL_IPV4		0x4
> #define MPLS_PL_IPV6		0x6
> #define MPLS_PL_MPLS		0x10
> #define MPLS_PL_ETHERNET_TAGGED	0x14
> #define MPLS_PL_ETHERNET	0x15
> #define MPLS_PL_IP		0x1B
>
>
> And have the semantics be that if you have foreced the payload type with
> the attribute and the packet does not match the specified payload we
> drop the packet.  Not having the BOS set for anything except for
> MPLS_PL_MPLS would be such an error that would cause the packet to
> be dropped, and having BOS set for MPLS_PL_MPLS would be an error.
>
> Where I am defining MPLS_PL_MPLS to be the payload type of a label
> that transports mpls traffic and is not expected to end at this node.
>
> Although I am not certain that you care about the case I am describing
> being handled by MPLS_PL_MPLS.

I'm happy to go with this approach, but the semantics I'm looking for 
are that a label signaled for the purposes of PHP can carry both IPv4/6 
and MPLS, but a label signaled at the end of an LSP (when not using PHP, 
i.e. for L3VPN or because of an LDP problem) can only carry IPv4/6.

Furthermore, I can't think of a use case for an LSP that can only carry 
other LSPs.

Therefore, I see several options here:
1. Make the payload type a bitmask. This doesn't seem very attractive to 
me as most combinations (e.g. Ethernet|ATM) wouldn't be valid.
2. Introduce further MPLS_PL_IPV4_MPLS and MPLS_PL_IPV6_MPLS options 
that would indicate that the payload could be IPv4/6 or that it's valid 
for there to be further labels.
3. Extract out the property of whether LSP can carry other LSPs or not 
so that it's separate from the payload type.

Any other suggestions? Would what be your preference?

Due to the semantic differences above, no matter which approach is taken 
there will need to be a separate type (or types with option 2) for an 
LSP that can carry both IPv4 and IPv6 traffic that isn't a pseudo-wire 
to preserve the existing desired behaviour.

In addition, the property of whether a control word is present or not is 
common for each PW type so that should be represented an a separate 
property within the same route attribute.

> We should also refuse to accept labels with the implicit NULL set in the
> RTA_NEWDST attribute.

Agreed, in conjunction with this alternative approach.

> I have read through a bunch of RFCs and I have not seen your distinction
> between implict NULL and unlabled show up anywhere quite the way you are
> making it.   Regardless what you are trying to do seems to be a
> transference from a signalling protocol to the rtnetlink attributes that
> mixes semantics.
>
> The current rtnetlink attribute semantics are clean simple and easy to
> understand.  When you see the label RTA_DST you remove it and you apply
> the label RTA_NEWDST.
>
> I find playing games with implicit NULL as opposed to using an
> RTA_MPLS_PAYLOAD_TYPE type to be mixing of unconnected things and likely
> to lead to maintenance problems in the future.

I can certainly understand that. The reasoning for taking this approach 
was that defining a new netlink route attribute seemed quite heavyweight 
for solving this problem in isolation. However, as you quite rightly 
suggest the attribute could be used for signaling other properties as well.

> Your reference to RFC3031 section 3.22 also does not work to justify
> this behavior as RFC3031 is about receiving a packet that whose top most
> label is not in the label table.

Ok, I can see that there is some ambiguity in that section that means 
that alone it could be read as referring to the top-most label being 
missing from the label table. However, there is another section (s3.18. 
Invalid Incoming Labels - 
https://tools.ietf.org/html/rfc3031#section-3.18) that deals explicitly 
with that case. Therefore, I believe that in conjunction with the 
statement in s3.22 stating "even though the incoming label is itself 
valid" and the title of the section ("Lack of Outgoing Label") the 
intention of the authors was to refer to the case I'm trying to address, 
i.e. that the next-hop is known, but there's no next-hop label 
forwarding entry.

> In short I think the packet handling semantics you are after are quite
> reasonable but your approach is unnecessarily complicated, and
> confusing.

Ok, thanks for the review.

>
> Eric
>

^ permalink raw reply	[flat|nested] 68+ messages in thread

* [PATCH net-next v4 0/6] mpls: Behaviour-changing improvements
  2015-03-30 18:15   ` [PATCH net-next v3 0/4] mpls: Behaviour-changing improvements Robert Shearman
                       ` (5 preceding siblings ...)
  2015-04-06 20:02     ` David Miller
@ 2015-04-14 22:44     ` Robert Shearman
  2015-04-14 22:44       ` [PATCH net-next v4 1/6] mpls: Use definition for reserved label checks Robert Shearman
                         ` (6 more replies)
  6 siblings, 7 replies; 68+ messages in thread
From: Robert Shearman @ 2015-04-14 22:44 UTC (permalink / raw)
  To: davem, ebiederm; +Cc: netdev, Robert Shearman

V4:
  - Split out per-device enabling of packet input into two
    patches. The first for creating the struct mpls_dev state per
    interface and the second actually implementing the enable config.
  - Reworked unlabeled patch to use new rtnetlink attribute,
    RTA_MPLS_PAYLOAD_TYPE instead of a special label value.
  - In payload type association patch, use success local variable
    instead of multiple return statements.
  - New patch to disallow the use of imp-null as an outgoing label.
V3:
  - Dropped PHP comment patch to avoid holding up the rest of the
    changes due to quibbling on nomenclature.
  - Corrected reference to RFC 3031 in commit message of patch
    2. Added reference to RFC 3031 s4.1.5 for PHP behaviour.
  - s/forwarding/input/ in patch 3.
  - Made MPT_IPV4 and MPT_IPV6 equal to 4 and 6 respectively in patch
    4, eliminating a switch on the version number as suggested by
    review comments. Added back references to RFCs, but moved them to
    mpls_payload_type enum declaration.
V2:
  - Updated to reference the correct RFC in the first patch.

This series consists of several small changes to make it easier to
understand the code, along with security and RFC-compliance
changes. These are important to consider before userspace begins
relying on the previous behaviour.

Robert Shearman (6):
  mpls: Use definition for reserved label checks
  mpls: Per-device MPLS state
  mpls: Per-device enabling of packet input
  mpls: Allow payload type to be associated with label routes
  mpls: Differentiate implicit-null and unlabeled neighbours
  mpls: Prevent use of implicit NULL label as outgoing label

 Documentation/networking/mpls-sysctl.txt |   9 ++
 include/linux/netdevice.h                |   4 +
 include/uapi/linux/mpls.h                |  16 ++
 include/uapi/linux/rtnetlink.h           |   1 +
 net/mpls/af_mpls.c                       | 242 ++++++++++++++++++++++++++-----
 net/mpls/internal.h                      |   7 +
 6 files changed, 239 insertions(+), 40 deletions(-)

-- 
2.1.4

^ permalink raw reply	[flat|nested] 68+ messages in thread

* [PATCH net-next v4 1/6] mpls: Use definition for reserved label checks
  2015-04-14 22:44     ` [PATCH net-next v4 0/6] " Robert Shearman
@ 2015-04-14 22:44       ` Robert Shearman
  2015-04-14 22:44       ` [PATCH net-next v4 2/6] mpls: Per-device MPLS state Robert Shearman
                         ` (5 subsequent siblings)
  6 siblings, 0 replies; 68+ messages in thread
From: Robert Shearman @ 2015-04-14 22:44 UTC (permalink / raw)
  To: davem, ebiederm; +Cc: netdev, Robert Shearman

In multiple locations there are checks for whether the label in hand
is a reserved label or not using the arbritray value of 16. Factor
this out into a #define for better maintainability and for
documentation.

Acked-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Robert Shearman <rshearma@brocade.com>
---
 net/mpls/af_mpls.c  | 20 ++++++++++----------
 net/mpls/internal.h |  1 +
 2 files changed, 11 insertions(+), 10 deletions(-)

diff --git a/net/mpls/af_mpls.c b/net/mpls/af_mpls.c
index db8a2ea6d4de..0d6763a895d6 100644
--- a/net/mpls/af_mpls.c
+++ b/net/mpls/af_mpls.c
@@ -276,7 +276,7 @@ static void mpls_notify_route(struct net *net, unsigned index,
 	struct mpls_route *rt = new ? new : old;
 	unsigned nlm_flags = (old && new) ? NLM_F_REPLACE : 0;
 	/* Ignore reserved labels for now */
-	if (rt && (index >= 16))
+	if (rt && (index >= LABEL_FIRST_UNRESERVED))
 		rtmsg_lfib(event, index, rt, nlh, net, portid, nlm_flags);
 }
 
@@ -310,7 +310,7 @@ static unsigned find_free_label(struct net *net)
 
 	platform_label = rtnl_dereference(net->mpls.platform_label);
 	platform_labels = net->mpls.platform_labels;
-	for (index = 16; index < platform_labels; index++) {
+	for (index = LABEL_FIRST_UNRESERVED; index < platform_labels; index++) {
 		if (!rtnl_dereference(platform_label[index]))
 			return index;
 	}
@@ -335,8 +335,8 @@ static int mpls_route_add(struct mpls_route_config *cfg)
 		index = find_free_label(net);
 	}
 
-	/* The first 16 labels are reserved, and may not be set */
-	if (index < 16)
+	/* Reserved labels may not be set */
+	if (index < LABEL_FIRST_UNRESERVED)
 		goto errout;
 
 	/* The full 20 bit range may not be supported. */
@@ -413,8 +413,8 @@ static int mpls_route_del(struct mpls_route_config *cfg)
 
 	index = cfg->rc_label;
 
-	/* The first 16 labels are reserved, and may not be removed */
-	if (index < 16)
+	/* Reserved labels may not be removed */
+	if (index < LABEL_FIRST_UNRESERVED)
 		goto errout;
 
 	/* The full 20 bit range may not be supported */
@@ -610,8 +610,8 @@ static int rtm_to_route_config(struct sk_buff *skb,  struct nlmsghdr *nlh,
 					   &cfg->rc_label))
 				goto errout;
 
-			/* The first 16 labels are reserved, and may not be set */
-			if (cfg->rc_label < 16)
+			/* Reserved labels may not be set */
+			if (cfg->rc_label < LABEL_FIRST_UNRESERVED)
 				goto errout;
 
 			break;
@@ -736,8 +736,8 @@ static int mpls_dump_routes(struct sk_buff *skb, struct netlink_callback *cb)
 	ASSERT_RTNL();
 
 	index = cb->args[0];
-	if (index < 16)
-		index = 16;
+	if (index < LABEL_FIRST_UNRESERVED)
+		index = LABEL_FIRST_UNRESERVED;
 
 	platform_label = rtnl_dereference(net->mpls.platform_label);
 	platform_labels = net->mpls.platform_labels;
diff --git a/net/mpls/internal.h b/net/mpls/internal.h
index fb6de92052c4..5732283ee1b9 100644
--- a/net/mpls/internal.h
+++ b/net/mpls/internal.h
@@ -9,6 +9,7 @@
 #define LABEL_GAL			13 /* RFC5586 */
 #define LABEL_OAM_ALERT			14 /* RFC3429 */
 #define LABEL_EXTENSION			15 /* RFC7274 */
+#define LABEL_FIRST_UNRESERVED		16 /* RFC3032 */
 
 
 struct mpls_shim_hdr {
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH net-next v4 2/6] mpls: Per-device MPLS state
  2015-04-14 22:44     ` [PATCH net-next v4 0/6] " Robert Shearman
  2015-04-14 22:44       ` [PATCH net-next v4 1/6] mpls: Use definition for reserved label checks Robert Shearman
@ 2015-04-14 22:44       ` Robert Shearman
  2015-04-14 22:45       ` [PATCH net-next v4 3/6] mpls: Per-device enabling of packet input Robert Shearman
                         ` (4 subsequent siblings)
  6 siblings, 0 replies; 68+ messages in thread
From: Robert Shearman @ 2015-04-14 22:44 UTC (permalink / raw)
  To: davem, ebiederm; +Cc: netdev, Robert Shearman

Add per-device MPLS state to supported interfaces. Use the presence of
this state in mpls_route_add to determine that this is a supported
interface.

Use the presence of mpls_dev to drop packets that arrived on an
unsupported interface - previously they were allowed through.

Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Robert Shearman <rshearma@brocade.com>
---
 include/linux/netdevice.h |  4 ++++
 net/mpls/af_mpls.c        | 50 +++++++++++++++++++++++++++++++++++++++++++++--
 net/mpls/internal.h       |  3 +++
 3 files changed, 55 insertions(+), 2 deletions(-)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 13acb3d8ecdd..51b7342d6354 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -60,6 +60,7 @@ struct phy_device;
 struct wireless_dev;
 /* 802.15.4 specific */
 struct wpan_dev;
+struct mpls_dev;
 
 void netdev_set_default_ethtool_ops(struct net_device *dev,
 				    const struct ethtool_ops *ops);
@@ -1629,6 +1630,9 @@ struct net_device {
 	void			*ax25_ptr;
 	struct wireless_dev	*ieee80211_ptr;
 	struct wpan_dev		*ieee802154_ptr;
+#if IS_ENABLED(CONFIG_MPLS_ROUTING)
+	struct mpls_dev __rcu	*mpls_ptr;
+#endif
 
 /*
  * Cache lines mostly used on receive path (including eth_type_trans())
diff --git a/net/mpls/af_mpls.c b/net/mpls/af_mpls.c
index 0d6763a895d6..6e7a91ec1ae1 100644
--- a/net/mpls/af_mpls.c
+++ b/net/mpls/af_mpls.c
@@ -53,6 +53,11 @@ static struct mpls_route *mpls_route_input_rcu(struct net *net, unsigned index)
 	return rt;
 }
 
+static inline struct mpls_dev *mpls_dev_get(const struct net_device *dev)
+{
+	return rcu_dereference_rtnl(dev->mpls_ptr);
+}
+
 static bool mpls_output_possible(const struct net_device *dev)
 {
 	return dev && (dev->flags & IFF_UP) && netif_carrier_ok(dev);
@@ -136,6 +141,7 @@ static int mpls_forward(struct sk_buff *skb, struct net_device *dev,
 	struct mpls_route *rt;
 	struct mpls_entry_decoded dec;
 	struct net_device *out_dev;
+	struct mpls_dev *mdev;
 	unsigned int hh_len;
 	unsigned int new_header_size;
 	unsigned int mtu;
@@ -143,6 +149,10 @@ static int mpls_forward(struct sk_buff *skb, struct net_device *dev,
 
 	/* Careful this entire function runs inside of an rcu critical section */
 
+	mdev = mpls_dev_get(dev);
+	if (!mdev)
+		goto drop;
+
 	if (skb->pkt_type != PACKET_HOST)
 		goto drop;
 
@@ -352,9 +362,9 @@ static int mpls_route_add(struct mpls_route_config *cfg)
 	if (!dev)
 		goto errout;
 
-	/* For now just support ethernet devices */
+	/* Ensure this is a supported device */
 	err = -EINVAL;
-	if ((dev->type != ARPHRD_ETHER) && (dev->type != ARPHRD_LOOPBACK))
+	if (!mpls_dev_get(dev))
 		goto errout;
 
 	err = -EINVAL;
@@ -428,10 +438,27 @@ errout:
 	return err;
 }
 
+static struct mpls_dev *mpls_add_dev(struct net_device *dev)
+{
+	struct mpls_dev *mdev;
+	int err = -ENOMEM;
+
+	ASSERT_RTNL();
+
+	mdev = kzalloc(sizeof(*mdev), GFP_KERNEL);
+	if (!mdev)
+		return ERR_PTR(err);
+
+	rcu_assign_pointer(dev->mpls_ptr, mdev);
+
+	return mdev;
+}
+
 static void mpls_ifdown(struct net_device *dev)
 {
 	struct mpls_route __rcu **platform_label;
 	struct net *net = dev_net(dev);
+	struct mpls_dev *mdev;
 	unsigned index;
 
 	platform_label = rtnl_dereference(net->mpls.platform_label);
@@ -443,14 +470,33 @@ static void mpls_ifdown(struct net_device *dev)
 			continue;
 		rt->rt_dev = NULL;
 	}
+
+	mdev = mpls_dev_get(dev);
+	if (!mdev)
+		return;
+
+	RCU_INIT_POINTER(dev->mpls_ptr, NULL);
+
+	kfree(mdev);
 }
 
 static int mpls_dev_notify(struct notifier_block *this, unsigned long event,
 			   void *ptr)
 {
 	struct net_device *dev = netdev_notifier_info_to_dev(ptr);
+	struct mpls_dev *mdev;
 
 	switch(event) {
+	case NETDEV_REGISTER:
+		/* For now just support ethernet devices */
+		if ((dev->type == ARPHRD_ETHER) ||
+		    (dev->type == ARPHRD_LOOPBACK)) {
+			mdev = mpls_add_dev(dev);
+			if (IS_ERR(mdev))
+				return notifier_from_errno(PTR_ERR(mdev));
+		}
+		break;
+
 	case NETDEV_UNREGISTER:
 		mpls_ifdown(dev);
 		break;
diff --git a/net/mpls/internal.h b/net/mpls/internal.h
index 5732283ee1b9..7de7e7850d1a 100644
--- a/net/mpls/internal.h
+++ b/net/mpls/internal.h
@@ -23,6 +23,9 @@ struct mpls_entry_decoded {
 	u8 bos;
 };
 
+struct mpls_dev {
+};
+
 struct sk_buff;
 
 static inline struct mpls_shim_hdr *mpls_hdr(const struct sk_buff *skb)
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH net-next v4 3/6] mpls: Per-device enabling of packet input
  2015-04-14 22:44     ` [PATCH net-next v4 0/6] " Robert Shearman
  2015-04-14 22:44       ` [PATCH net-next v4 1/6] mpls: Use definition for reserved label checks Robert Shearman
  2015-04-14 22:44       ` [PATCH net-next v4 2/6] mpls: Per-device MPLS state Robert Shearman
@ 2015-04-14 22:45       ` Robert Shearman
  2015-04-14 22:45       ` [PATCH net-next v4 4/6] mpls: Allow payload type to be associated with label routes Robert Shearman
                         ` (3 subsequent siblings)
  6 siblings, 0 replies; 68+ messages in thread
From: Robert Shearman @ 2015-04-14 22:45 UTC (permalink / raw)
  To: davem, ebiederm; +Cc: netdev, Robert Shearman

An MPLS network is a single trust domain where the edges must be in
control of what labels make their way into the core. The simplest way
of ensuring this is for the edge device to always impose the labels,
and not allow forward labeled traffic from untrusted neighbours. This
is achieved by allowing a per-device configuration of whether MPLS
traffic input from that interface should be processed or not.

To be secure by default, the default state is changed to MPLS being
disabled on all interfaces (except the loopback) unless explicitly
enabled and no global option is provided to change the default. Whilst
this differs from other protocols (e.g. IPv6), network operators are
used to explicitly enabling MPLS forwarding on interfaces, and with
the number of links to the MPLS core typically fairly low this doesn't
present too much of a burden on operators.

Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Robert Shearman <rshearma@brocade.com>
---
 Documentation/networking/mpls-sysctl.txt |  9 ++++
 net/mpls/af_mpls.c                       | 75 +++++++++++++++++++++++++++++++-
 net/mpls/internal.h                      |  3 ++
 3 files changed, 85 insertions(+), 2 deletions(-)

diff --git a/Documentation/networking/mpls-sysctl.txt b/Documentation/networking/mpls-sysctl.txt
index 639ddf0ece9b..9ed15f86c17c 100644
--- a/Documentation/networking/mpls-sysctl.txt
+++ b/Documentation/networking/mpls-sysctl.txt
@@ -18,3 +18,12 @@ platform_labels - INTEGER
 
 	Possible values: 0 - 1048575
 	Default: 0
+
+conf/<interface>/input - BOOL
+	Control whether packets can be input on this interface.
+
+	If disabled, packets will be discarded without further
+	processing.
+
+	0 - disabled (default)
+	not 0 - enabled
diff --git a/net/mpls/af_mpls.c b/net/mpls/af_mpls.c
index 6e7a91ec1ae1..1fd303a87df4 100644
--- a/net/mpls/af_mpls.c
+++ b/net/mpls/af_mpls.c
@@ -150,7 +150,7 @@ static int mpls_forward(struct sk_buff *skb, struct net_device *dev,
 	/* Careful this entire function runs inside of an rcu critical section */
 
 	mdev = mpls_dev_get(dev);
-	if (!mdev)
+	if (!mdev || !mdev->input_enabled)
 		goto drop;
 
 	if (skb->pkt_type != PACKET_HOST)
@@ -438,6 +438,60 @@ errout:
 	return err;
 }
 
+#define MPLS_PERDEV_SYSCTL_OFFSET(field)	\
+	(&((struct mpls_dev *)0)->field)
+
+static const struct ctl_table mpls_dev_table[] = {
+	{
+		.procname	= "input",
+		.maxlen		= sizeof(int),
+		.mode		= 0644,
+		.proc_handler	= proc_dointvec,
+		.data		= MPLS_PERDEV_SYSCTL_OFFSET(input_enabled),
+	},
+	{ }
+};
+
+static int mpls_dev_sysctl_register(struct net_device *dev,
+				    struct mpls_dev *mdev)
+{
+	char path[sizeof("net/mpls/conf/") + IFNAMSIZ];
+	struct ctl_table *table;
+	int i;
+
+	table = kmemdup(&mpls_dev_table, sizeof(mpls_dev_table), GFP_KERNEL);
+	if (!table)
+		goto out;
+
+	/* Table data contains only offsets relative to the base of
+	 * the mdev at this point, so make them absolute.
+	 */
+	for (i = 0; i < ARRAY_SIZE(mpls_dev_table); i++)
+		table[i].data = (char *)mdev + (uintptr_t)table[i].data;
+
+	snprintf(path, sizeof(path), "net/mpls/conf/%s", dev->name);
+
+	mdev->sysctl = register_net_sysctl(dev_net(dev), path, table);
+	if (!mdev->sysctl)
+		goto free;
+
+	return 0;
+
+free:
+	kfree(table);
+out:
+	return -ENOBUFS;
+}
+
+static void mpls_dev_sysctl_unregister(struct mpls_dev *mdev)
+{
+	struct ctl_table *table;
+
+	table = mdev->sysctl->ctl_table_arg;
+	unregister_net_sysctl_table(mdev->sysctl);
+	kfree(table);
+}
+
 static struct mpls_dev *mpls_add_dev(struct net_device *dev)
 {
 	struct mpls_dev *mdev;
@@ -449,9 +503,24 @@ static struct mpls_dev *mpls_add_dev(struct net_device *dev)
 	if (!mdev)
 		return ERR_PTR(err);
 
+	/* Enable MPLS by default on loopback devices, since this
+	 * doesn't represent a security boundary and is required for the
+	 * lookup of inner labels for LSPs terminating on this router.
+	 */
+	if (dev->flags & IFF_LOOPBACK)
+		mdev->input_enabled = 1;
+
+	err = mpls_dev_sysctl_register(dev, mdev);
+	if (err)
+		goto free;
+
 	rcu_assign_pointer(dev->mpls_ptr, mdev);
 
 	return mdev;
+
+free:
+	kfree(mdev);
+	return ERR_PTR(err);
 }
 
 static void mpls_ifdown(struct net_device *dev)
@@ -475,6 +544,8 @@ static void mpls_ifdown(struct net_device *dev)
 	if (!mdev)
 		return;
 
+	mpls_dev_sysctl_unregister(mdev);
+
 	RCU_INIT_POINTER(dev->mpls_ptr, NULL);
 
 	kfree(mdev);
@@ -958,7 +1029,7 @@ static int mpls_platform_labels(struct ctl_table *table, int write,
 	return ret;
 }
 
-static struct ctl_table mpls_table[] = {
+static const struct ctl_table mpls_table[] = {
 	{
 		.procname	= "platform_labels",
 		.data		= NULL,
diff --git a/net/mpls/internal.h b/net/mpls/internal.h
index 7de7e7850d1a..d0aad5e9a2c9 100644
--- a/net/mpls/internal.h
+++ b/net/mpls/internal.h
@@ -24,6 +24,9 @@ struct mpls_entry_decoded {
 };
 
 struct mpls_dev {
+	int			input_enabled;
+
+	struct ctl_table_header *sysctl;
 };
 
 struct sk_buff;
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH net-next v4 4/6] mpls: Allow payload type to be associated with label routes
  2015-04-14 22:44     ` [PATCH net-next v4 0/6] " Robert Shearman
                         ` (2 preceding siblings ...)
  2015-04-14 22:45       ` [PATCH net-next v4 3/6] mpls: Per-device enabling of packet input Robert Shearman
@ 2015-04-14 22:45       ` Robert Shearman
  2015-04-14 22:45       ` [PATCH net-next v4 5/6] mpls: Differentiate implicit-null and unlabeled neighbours Robert Shearman
                         ` (2 subsequent siblings)
  6 siblings, 0 replies; 68+ messages in thread
From: Robert Shearman @ 2015-04-14 22:45 UTC (permalink / raw)
  To: davem, ebiederm; +Cc: netdev, Robert Shearman

RFC 4182 s2 states that if an IPv4 Explicit NULL label is the only
label on the stack, then after popping the resulting packet must be
treated as a IPv4 packet and forwarded based on the IPv4 header. The
same is true for IPv6 Explicit NULL with an IPv6 packet following.

Therefore, when installing the IPv4/IPv6 Explicit NULL label routes,
add an attribute that specifies the expected payload type for use at
forwarding time for determining the type of the encapsulated packet
instead of inspecting the first nibble of the packet.

Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Robert Shearman <rshearma@brocade.com>
---
 net/mpls/af_mpls.c | 71 +++++++++++++++++++++++++++++++++---------------------
 1 file changed, 44 insertions(+), 27 deletions(-)

diff --git a/net/mpls/af_mpls.c b/net/mpls/af_mpls.c
index 1fd303a87df4..f802578f5172 100644
--- a/net/mpls/af_mpls.c
+++ b/net/mpls/af_mpls.c
@@ -23,11 +23,23 @@
 /* This maximum ha length copied from the definition of struct neighbour */
 #define MAX_VIA_ALEN (ALIGN(MAX_ADDR_LEN, sizeof(unsigned long)))
 
+enum mpls_payload_type {
+	MPT_UNSPEC, /* IPv4 or IPv6 */
+	MPT_IPV4 = 4,
+	MPT_IPV6 = 6,
+
+	/* Other types not implemented:
+	 *  - Pseudo-wire with or without control word (RFC4385)
+	 *  - GAL (RFC5586)
+	 */
+};
+
 struct mpls_route { /* next hop label forwarding entry */
 	struct net_device __rcu *rt_dev;
 	struct rcu_head		rt_rcu;
 	u32			rt_label[MAX_NEW_LABELS];
 	u8			rt_protocol; /* routing protocol that set this entry */
+	u8                      rt_payload_type;
 	u8			rt_labels;
 	u8			rt_via_alen;
 	u8			rt_via_table;
@@ -89,16 +101,8 @@ static bool mpls_pkt_too_big(const struct sk_buff *skb, unsigned int mtu)
 static bool mpls_egress(struct mpls_route *rt, struct sk_buff *skb,
 			struct mpls_entry_decoded dec)
 {
-	/* RFC4385 and RFC5586 encode other packets in mpls such that
-	 * they don't conflict with the ip version number, making
-	 * decoding by examining the ip version correct in everything
-	 * except for the strangest cases.
-	 *
-	 * The strange cases if we choose to support them will require
-	 * manual configuration.
-	 */
-	struct iphdr *hdr4;
-	bool success = true;
+	enum mpls_payload_type payload_type;
+	bool success = false;
 
 	/* The IPv4 code below accesses through the IPv4 header
 	 * checksum, which is 12 bytes into the packet.
@@ -113,23 +117,32 @@ static bool mpls_egress(struct mpls_route *rt, struct sk_buff *skb,
 	if (!pskb_may_pull(skb, 12))
 		return false;
 
-	/* Use ip_hdr to find the ip protocol version */
-	hdr4 = ip_hdr(skb);
-	if (hdr4->version == 4) {
+	payload_type = rt->rt_payload_type;
+	if (payload_type == MPT_UNSPEC)
+		payload_type = ip_hdr(skb)->version;
+
+	switch (payload_type) {
+	case MPT_IPV4: {
+		struct iphdr *hdr4 = ip_hdr(skb);
 		skb->protocol = htons(ETH_P_IP);
 		csum_replace2(&hdr4->check,
 			      htons(hdr4->ttl << 8),
 			      htons(dec.ttl << 8));
 		hdr4->ttl = dec.ttl;
+		success = true;
+		break;
 	}
-	else if (hdr4->version == 6) {
+	case MPT_IPV6: {
 		struct ipv6hdr *hdr6 = ipv6_hdr(skb);
 		skb->protocol = htons(ETH_P_IPV6);
 		hdr6->hop_limit = dec.ttl;
+		success = true;
+		break;
 	}
-	else
-		/* version 0 and version 1 are used by pseudo wires */
-		success = false;
+	case MPT_UNSPEC:
+		break;
+	}
+
 	return success;
 }
 
@@ -248,16 +261,17 @@ static const struct nla_policy rtm_mpls_policy[RTA_MAX+1] = {
 };
 
 struct mpls_route_config {
-	u32		rc_protocol;
-	u32		rc_ifindex;
-	u16		rc_via_table;
-	u16		rc_via_alen;
-	u8		rc_via[MAX_VIA_ALEN];
-	u32		rc_label;
-	u32		rc_output_labels;
-	u32		rc_output_label[MAX_NEW_LABELS];
-	u32		rc_nlflags;
-	struct nl_info	rc_nlinfo;
+	u32			rc_protocol;
+	u32			rc_ifindex;
+	u16			rc_via_table;
+	u16			rc_via_alen;
+	u8			rc_via[MAX_VIA_ALEN];
+	u32			rc_label;
+	u32			rc_output_labels;
+	u32			rc_output_label[MAX_NEW_LABELS];
+	u32			rc_nlflags;
+	enum mpls_payload_type	rc_payload_type;
+	struct nl_info		rc_nlinfo;
 };
 
 static struct mpls_route *mpls_rt_alloc(size_t alen)
@@ -401,6 +415,7 @@ static int mpls_route_add(struct mpls_route_config *cfg)
 		rt->rt_label[i] = cfg->rc_output_label[i];
 	rt->rt_protocol = cfg->rc_protocol;
 	RCU_INIT_POINTER(rt->rt_dev, dev);
+	rt->rt_payload_type = cfg->rc_payload_type;
 	rt->rt_via_table = cfg->rc_via_table;
 	memcpy(rt->rt_via, cfg->rc_via, cfg->rc_via_alen);
 
@@ -940,6 +955,7 @@ static int resize_platform_label_table(struct net *net, size_t limit)
 			goto nort0;
 		RCU_INIT_POINTER(rt0->rt_dev, lo);
 		rt0->rt_protocol = RTPROT_KERNEL;
+		rt0->rt_payload_type = MPT_IPV4;
 		rt0->rt_via_table = NEIGH_LINK_TABLE;
 		memcpy(rt0->rt_via, lo->dev_addr, lo->addr_len);
 	}
@@ -950,6 +966,7 @@ static int resize_platform_label_table(struct net *net, size_t limit)
 			goto nort2;
 		RCU_INIT_POINTER(rt2->rt_dev, lo);
 		rt2->rt_protocol = RTPROT_KERNEL;
+		rt2->rt_payload_type = MPT_IPV6;
 		rt2->rt_via_table = NEIGH_LINK_TABLE;
 		memcpy(rt2->rt_via, lo->dev_addr, lo->addr_len);
 	}
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH net-next v4 5/6] mpls: Differentiate implicit-null and unlabeled neighbours
  2015-04-14 22:44     ` [PATCH net-next v4 0/6] " Robert Shearman
                         ` (3 preceding siblings ...)
  2015-04-14 22:45       ` [PATCH net-next v4 4/6] mpls: Allow payload type to be associated with label routes Robert Shearman
@ 2015-04-14 22:45       ` Robert Shearman
  2015-04-14 22:45       ` [PATCH net-next v4 6/6] mpls: Prevent use of implicit NULL label as outgoing label Robert Shearman
  2015-04-21 20:34       ` [PATCH 0/3] mpls: ABI changes for security and correctness Robert Shearman
  6 siblings, 0 replies; 68+ messages in thread
From: Robert Shearman @ 2015-04-14 22:45 UTC (permalink / raw)
  To: davem, ebiederm; +Cc: netdev, Robert Shearman

The control plane can advertise labels for neighbours that don't have
an outgoing label which means that in terms of RFC3031 the label is
valid, but there won't be an NHFLE. RFC3031 s3.22 states in this
situation that either the remaining labels should be popped (if the
control plane can determine that it's safe to do so, which in light of
MPLS-VPN, RFC4364, is never the case now) or that the packet should
be discarded.

Therefore, introduce a new route attribute, RTA_MPLS_PAYLOAD_TYPE,
that allows the control plane to restrict/specify what traffic is
carried by the LSP (suggested by Eric W. Biederman). Add a flag that
can be used in combination with a type to allow the control plane to
specify that packets arriving on an LSP must be BOS only. Otherwise,
the packets are dropped.

Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Robert Shearman <rshearma@brocade.com>
---
 include/uapi/linux/mpls.h      | 16 +++++++++++
 include/uapi/linux/rtnetlink.h |  1 +
 net/mpls/af_mpls.c             | 61 +++++++++++++++++++++++++++---------------
 3 files changed, 57 insertions(+), 21 deletions(-)

diff --git a/include/uapi/linux/mpls.h b/include/uapi/linux/mpls.h
index bc9abfe88c9a..fb6aa9a054a8 100644
--- a/include/uapi/linux/mpls.h
+++ b/include/uapi/linux/mpls.h
@@ -31,4 +31,20 @@ struct mpls_label {
 #define MPLS_LS_TTL_MASK        0x000000FF
 #define MPLS_LS_TTL_SHIFT       0
 
+/* RTA_MPLS_PAYLOAD_TYPE - u32 specifying type and zero or more flags */
+enum rtmpls_payload_type {
+	RTMPT_IP		= 0x0000, /* IPv4 or IPv6 */
+	RTMPT_IPV4		= 0x0004,
+	RTMPT_IPV6		= 0x0006,
+
+	/* Other types not implemented:
+	 *  - Pseudo-wire with or without control word (RFC4385)
+	 *  - GAL (RFC5586)
+	 */
+};
+#define RTMPT_TYPE_MASK		0x0000ffff
+
+#define RTMPT_FLAG_BOS_ONLY	0x80000000
+#define RTMPT_ALL_FLAGS		(RTMPT_FLAG_BOS_ONLY)
+
 #endif /* _UAPI_MPLS_H */
diff --git a/include/uapi/linux/rtnetlink.h b/include/uapi/linux/rtnetlink.h
index 974db03f7b1a..aa9b7a775a2e 100644
--- a/include/uapi/linux/rtnetlink.h
+++ b/include/uapi/linux/rtnetlink.h
@@ -308,6 +308,7 @@ enum rtattr_type_t {
 	RTA_VIA,
 	RTA_NEWDST,
 	RTA_PREF,
+	RTA_MPLS_PAYLOAD_TYPE,
 	__RTA_MAX
 };
 
diff --git a/net/mpls/af_mpls.c b/net/mpls/af_mpls.c
index f802578f5172..e99f88556d6b 100644
--- a/net/mpls/af_mpls.c
+++ b/net/mpls/af_mpls.c
@@ -23,23 +23,12 @@
 /* This maximum ha length copied from the definition of struct neighbour */
 #define MAX_VIA_ALEN (ALIGN(MAX_ADDR_LEN, sizeof(unsigned long)))
 
-enum mpls_payload_type {
-	MPT_UNSPEC, /* IPv4 or IPv6 */
-	MPT_IPV4 = 4,
-	MPT_IPV6 = 6,
-
-	/* Other types not implemented:
-	 *  - Pseudo-wire with or without control word (RFC4385)
-	 *  - GAL (RFC5586)
-	 */
-};
-
 struct mpls_route { /* next hop label forwarding entry */
 	struct net_device __rcu *rt_dev;
 	struct rcu_head		rt_rcu;
 	u32			rt_label[MAX_NEW_LABELS];
+	u32                     rt_payload_type;
 	u8			rt_protocol; /* routing protocol that set this entry */
-	u8                      rt_payload_type;
 	u8			rt_labels;
 	u8			rt_via_alen;
 	u8			rt_via_table;
@@ -101,7 +90,7 @@ static bool mpls_pkt_too_big(const struct sk_buff *skb, unsigned int mtu)
 static bool mpls_egress(struct mpls_route *rt, struct sk_buff *skb,
 			struct mpls_entry_decoded dec)
 {
-	enum mpls_payload_type payload_type;
+	enum rtmpls_payload_type payload_type;
 	bool success = false;
 
 	/* The IPv4 code below accesses through the IPv4 header
@@ -117,12 +106,12 @@ static bool mpls_egress(struct mpls_route *rt, struct sk_buff *skb,
 	if (!pskb_may_pull(skb, 12))
 		return false;
 
-	payload_type = rt->rt_payload_type;
-	if (payload_type == MPT_UNSPEC)
+	payload_type = rt->rt_payload_type & RTMPT_TYPE_MASK;
+	if (payload_type == RTMPT_IP)
 		payload_type = ip_hdr(skb)->version;
 
 	switch (payload_type) {
-	case MPT_IPV4: {
+	case RTMPT_IPV4: {
 		struct iphdr *hdr4 = ip_hdr(skb);
 		skb->protocol = htons(ETH_P_IP);
 		csum_replace2(&hdr4->check,
@@ -132,14 +121,15 @@ static bool mpls_egress(struct mpls_route *rt, struct sk_buff *skb,
 		success = true;
 		break;
 	}
-	case MPT_IPV6: {
+	case RTMPT_IPV6: {
 		struct ipv6hdr *hdr6 = ipv6_hdr(skb);
 		skb->protocol = htons(ETH_P_IPV6);
 		hdr6->hop_limit = dec.ttl;
 		success = true;
 		break;
 	}
-	case MPT_UNSPEC:
+	case RTMPT_IP:
+		/* Should have decided which protocol it is by now */
 		break;
 	}
 
@@ -225,6 +215,11 @@ static int mpls_forward(struct sk_buff *skb, struct net_device *dev,
 		/* Penultimate hop popping */
 		if (!mpls_egress(rt, skb, dec))
 			goto drop;
+	} else if (rt->rt_payload_type & RTMPT_FLAG_BOS_ONLY) {
+		/* Labeled traffic destined to unlabeled peer should
+		 * be discarded
+		 */
+		goto drop;
 	} else {
 		bool bos;
 		int i;
@@ -258,6 +253,7 @@ static struct packet_type mpls_packet_type __read_mostly = {
 static const struct nla_policy rtm_mpls_policy[RTA_MAX+1] = {
 	[RTA_DST]		= { .type = NLA_U32 },
 	[RTA_OIF]		= { .type = NLA_U32 },
+	[RTA_MPLS_PAYLOAD_TYPE]	= { .type = NLA_U32 },
 };
 
 struct mpls_route_config {
@@ -270,7 +266,7 @@ struct mpls_route_config {
 	u32			rc_output_labels;
 	u32			rc_output_label[MAX_NEW_LABELS];
 	u32			rc_nlflags;
-	enum mpls_payload_type	rc_payload_type;
+	u32			rc_payload_type;
 	struct nl_info		rc_nlinfo;
 };
 
@@ -781,6 +777,24 @@ static int rtm_to_route_config(struct sk_buff *skb,  struct nlmsghdr *nlh,
 			memcpy(cfg->rc_via, via->rtvia_addr, cfg->rc_via_alen);
 			break;
 		}
+		case RTA_MPLS_PAYLOAD_TYPE:
+			cfg->rc_payload_type = nla_get_u32(nla);
+
+			/* Ensure there are no unsupported flags */
+			if (cfg->rc_payload_type &
+			    ~(RTMPT_TYPE_MASK | RTMPT_ALL_FLAGS))
+				goto errout;
+
+			switch (cfg->rc_payload_type & RTMPT_TYPE_MASK) {
+			case RTMPT_IP:
+			case RTMPT_IPV4:
+			case RTMPT_IPV6:
+				break;
+			default:
+				goto errout;
+			}
+
+			break;
 		default:
 			/* Unsupported attribute */
 			goto errout;
@@ -849,6 +863,9 @@ static int mpls_dump_route(struct sk_buff *skb, u32 portid, u32 seq, int event,
 		goto nla_put_failure;
 	if (nla_put_labels(skb, RTA_DST, 1, &label))
 		goto nla_put_failure;
+	if (rt->rt_payload_type &&
+	    nla_put_u32(skb, RTA_MPLS_PAYLOAD_TYPE, rt->rt_payload_type))
+		goto nla_put_failure;
 
 	nlmsg_end(skb, nlh);
 	return 0;
@@ -899,6 +916,8 @@ static inline size_t lfib_nlmsg_size(struct mpls_route *rt)
 		payload += nla_total_size(rt->rt_labels * 4);
 	if (rt->rt_dev)					/* RTA_OIF */
 		payload += nla_total_size(4);
+	if (rt->rt_payload_type)
+		payload += nla_total_size(4); /* RTA_MPLS_PAYLOAD_TYPE */
 	return payload;
 }
 
@@ -955,7 +974,7 @@ static int resize_platform_label_table(struct net *net, size_t limit)
 			goto nort0;
 		RCU_INIT_POINTER(rt0->rt_dev, lo);
 		rt0->rt_protocol = RTPROT_KERNEL;
-		rt0->rt_payload_type = MPT_IPV4;
+		rt0->rt_payload_type = RTMPT_IPV4;
 		rt0->rt_via_table = NEIGH_LINK_TABLE;
 		memcpy(rt0->rt_via, lo->dev_addr, lo->addr_len);
 	}
@@ -966,7 +985,7 @@ static int resize_platform_label_table(struct net *net, size_t limit)
 			goto nort2;
 		RCU_INIT_POINTER(rt2->rt_dev, lo);
 		rt2->rt_protocol = RTPROT_KERNEL;
-		rt2->rt_payload_type = MPT_IPV6;
+		rt2->rt_payload_type = RTMPT_IPV6;
 		rt2->rt_via_table = NEIGH_LINK_TABLE;
 		memcpy(rt2->rt_via, lo->dev_addr, lo->addr_len);
 	}
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH net-next v4 6/6] mpls: Prevent use of implicit NULL label as outgoing label
  2015-04-14 22:44     ` [PATCH net-next v4 0/6] " Robert Shearman
                         ` (4 preceding siblings ...)
  2015-04-14 22:45       ` [PATCH net-next v4 5/6] mpls: Differentiate implicit-null and unlabeled neighbours Robert Shearman
@ 2015-04-14 22:45       ` Robert Shearman
  2015-04-21 20:34       ` [PATCH 0/3] mpls: ABI changes for security and correctness Robert Shearman
  6 siblings, 0 replies; 68+ messages in thread
From: Robert Shearman @ 2015-04-14 22:45 UTC (permalink / raw)
  To: davem, ebiederm; +Cc: netdev, Robert Shearman

The reserved implicit-NULL label isn't allowed to appear in the label
stack for packets, so make it an error for the control plane to
specify it as an outgoing label.

Suggested-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Robert Shearman <rshearma@brocade.com>
---
 net/mpls/af_mpls.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/net/mpls/af_mpls.c b/net/mpls/af_mpls.c
index e99f88556d6b..414aacb84089 100644
--- a/net/mpls/af_mpls.c
+++ b/net/mpls/af_mpls.c
@@ -664,6 +664,15 @@ int nla_get_labels(const struct nlattr *nla,
 		if ((dec.bos != bos) || dec.ttl || dec.tc)
 			return -EINVAL;
 
+		switch (dec.label) {
+		case LABEL_IMPLICIT_NULL:
+			/* RFC3032: This is a label that an LSR may
+			 * assign and distribute, but which never
+			 * actually appears in the encapsulation.
+			 */
+			return -EINVAL;
+		}
+
 		label[i] = dec.label;
 	}
 	*labels = nla_labels;
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH 0/3] mpls: ABI changes for security and correctness
  2015-04-14 22:44     ` [PATCH net-next v4 0/6] " Robert Shearman
                         ` (5 preceding siblings ...)
  2015-04-14 22:45       ` [PATCH net-next v4 6/6] mpls: Prevent use of implicit NULL label as outgoing label Robert Shearman
@ 2015-04-21 20:34       ` Robert Shearman
  2015-04-21 20:34         ` [PATCH 1/3] mpls: Per-device MPLS state Robert Shearman
                           ` (4 more replies)
  6 siblings, 5 replies; 68+ messages in thread
From: Robert Shearman @ 2015-04-21 20:34 UTC (permalink / raw)
  To: davem, ebiederm; +Cc: netdev, Robert Shearman

These changes make mpls not be enabled by default on all
interfaces when in use for security, along with ensuring that a label
not valid as an outgoing label can be added in mpls routes.

This series contains three ABI/behaviour-affecting changes which have
been split out from "[PATCH net-next v4 0/6] mpls: Behaviour-changing
improvements" without any further modification. These changes need to
be considered for 4.1 otherwise we'll be stuck with the current
behaviour/ABI forever.

Robert Shearman (3):
  mpls: Per-device MPLS state
  mpls: Per-device enabling of packet input
  mpls: Prevent use of implicit NULL label as outgoing label

 Documentation/networking/mpls-sysctl.txt |   9 +++
 include/linux/netdevice.h                |   4 +
 net/mpls/af_mpls.c                       | 132 ++++++++++++++++++++++++++++++-
 net/mpls/internal.h                      |   6 ++
 4 files changed, 148 insertions(+), 3 deletions(-)

-- 
2.1.4

^ permalink raw reply	[flat|nested] 68+ messages in thread

* [PATCH 1/3] mpls: Per-device MPLS state
  2015-04-21 20:34       ` [PATCH 0/3] mpls: ABI changes for security and correctness Robert Shearman
@ 2015-04-21 20:34         ` Robert Shearman
  2015-04-21 20:34         ` [PATCH 2/3] mpls: Per-device enabling of packet input Robert Shearman
                           ` (3 subsequent siblings)
  4 siblings, 0 replies; 68+ messages in thread
From: Robert Shearman @ 2015-04-21 20:34 UTC (permalink / raw)
  To: davem, ebiederm; +Cc: netdev, Robert Shearman

Add per-device MPLS state to supported interfaces. Use the presence of
this state in mpls_route_add to determine that this is a supported
interface.

Use the presence of mpls_dev to drop packets that arrived on an
unsupported interface - previously they were allowed through.

Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Robert Shearman <rshearma@brocade.com>
---
 include/linux/netdevice.h |  4 ++++
 net/mpls/af_mpls.c        | 50 +++++++++++++++++++++++++++++++++++++++++++++--
 net/mpls/internal.h       |  3 +++
 3 files changed, 55 insertions(+), 2 deletions(-)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index bcbde799ec69..dae106a3a998 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -60,6 +60,7 @@ struct phy_device;
 struct wireless_dev;
 /* 802.15.4 specific */
 struct wpan_dev;
+struct mpls_dev;
 
 void netdev_set_default_ethtool_ops(struct net_device *dev,
 				    const struct ethtool_ops *ops);
@@ -1627,6 +1628,9 @@ struct net_device {
 	void			*ax25_ptr;
 	struct wireless_dev	*ieee80211_ptr;
 	struct wpan_dev		*ieee802154_ptr;
+#if IS_ENABLED(CONFIG_MPLS_ROUTING)
+	struct mpls_dev __rcu	*mpls_ptr;
+#endif
 
 /*
  * Cache lines mostly used on receive path (including eth_type_trans())
diff --git a/net/mpls/af_mpls.c b/net/mpls/af_mpls.c
index db8a2ea6d4de..ad45017eed99 100644
--- a/net/mpls/af_mpls.c
+++ b/net/mpls/af_mpls.c
@@ -53,6 +53,11 @@ static struct mpls_route *mpls_route_input_rcu(struct net *net, unsigned index)
 	return rt;
 }
 
+static inline struct mpls_dev *mpls_dev_get(const struct net_device *dev)
+{
+	return rcu_dereference_rtnl(dev->mpls_ptr);
+}
+
 static bool mpls_output_possible(const struct net_device *dev)
 {
 	return dev && (dev->flags & IFF_UP) && netif_carrier_ok(dev);
@@ -136,6 +141,7 @@ static int mpls_forward(struct sk_buff *skb, struct net_device *dev,
 	struct mpls_route *rt;
 	struct mpls_entry_decoded dec;
 	struct net_device *out_dev;
+	struct mpls_dev *mdev;
 	unsigned int hh_len;
 	unsigned int new_header_size;
 	unsigned int mtu;
@@ -143,6 +149,10 @@ static int mpls_forward(struct sk_buff *skb, struct net_device *dev,
 
 	/* Careful this entire function runs inside of an rcu critical section */
 
+	mdev = mpls_dev_get(dev);
+	if (!mdev)
+		goto drop;
+
 	if (skb->pkt_type != PACKET_HOST)
 		goto drop;
 
@@ -352,9 +362,9 @@ static int mpls_route_add(struct mpls_route_config *cfg)
 	if (!dev)
 		goto errout;
 
-	/* For now just support ethernet devices */
+	/* Ensure this is a supported device */
 	err = -EINVAL;
-	if ((dev->type != ARPHRD_ETHER) && (dev->type != ARPHRD_LOOPBACK))
+	if (!mpls_dev_get(dev))
 		goto errout;
 
 	err = -EINVAL;
@@ -428,10 +438,27 @@ errout:
 	return err;
 }
 
+static struct mpls_dev *mpls_add_dev(struct net_device *dev)
+{
+	struct mpls_dev *mdev;
+	int err = -ENOMEM;
+
+	ASSERT_RTNL();
+
+	mdev = kzalloc(sizeof(*mdev), GFP_KERNEL);
+	if (!mdev)
+		return ERR_PTR(err);
+
+	rcu_assign_pointer(dev->mpls_ptr, mdev);
+
+	return mdev;
+}
+
 static void mpls_ifdown(struct net_device *dev)
 {
 	struct mpls_route __rcu **platform_label;
 	struct net *net = dev_net(dev);
+	struct mpls_dev *mdev;
 	unsigned index;
 
 	platform_label = rtnl_dereference(net->mpls.platform_label);
@@ -443,14 +470,33 @@ static void mpls_ifdown(struct net_device *dev)
 			continue;
 		rt->rt_dev = NULL;
 	}
+
+	mdev = mpls_dev_get(dev);
+	if (!mdev)
+		return;
+
+	RCU_INIT_POINTER(dev->mpls_ptr, NULL);
+
+	kfree(mdev);
 }
 
 static int mpls_dev_notify(struct notifier_block *this, unsigned long event,
 			   void *ptr)
 {
 	struct net_device *dev = netdev_notifier_info_to_dev(ptr);
+	struct mpls_dev *mdev;
 
 	switch(event) {
+	case NETDEV_REGISTER:
+		/* For now just support ethernet devices */
+		if ((dev->type == ARPHRD_ETHER) ||
+		    (dev->type == ARPHRD_LOOPBACK)) {
+			mdev = mpls_add_dev(dev);
+			if (IS_ERR(mdev))
+				return notifier_from_errno(PTR_ERR(mdev));
+		}
+		break;
+
 	case NETDEV_UNREGISTER:
 		mpls_ifdown(dev);
 		break;
diff --git a/net/mpls/internal.h b/net/mpls/internal.h
index fb6de92052c4..8090cb3099b4 100644
--- a/net/mpls/internal.h
+++ b/net/mpls/internal.h
@@ -22,6 +22,9 @@ struct mpls_entry_decoded {
 	u8 bos;
 };
 
+struct mpls_dev {
+};
+
 struct sk_buff;
 
 static inline struct mpls_shim_hdr *mpls_hdr(const struct sk_buff *skb)
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH 2/3] mpls: Per-device enabling of packet input
  2015-04-21 20:34       ` [PATCH 0/3] mpls: ABI changes for security and correctness Robert Shearman
  2015-04-21 20:34         ` [PATCH 1/3] mpls: Per-device MPLS state Robert Shearman
@ 2015-04-21 20:34         ` Robert Shearman
  2015-04-21 20:34         ` [PATCH 3/3] mpls: Prevent use of implicit NULL label as outgoing label Robert Shearman
                           ` (2 subsequent siblings)
  4 siblings, 0 replies; 68+ messages in thread
From: Robert Shearman @ 2015-04-21 20:34 UTC (permalink / raw)
  To: davem, ebiederm; +Cc: netdev, Robert Shearman

An MPLS network is a single trust domain where the edges must be in
control of what labels make their way into the core. The simplest way
of ensuring this is for the edge device to always impose the labels,
and not allow forward labeled traffic from untrusted neighbours. This
is achieved by allowing a per-device configuration of whether MPLS
traffic input from that interface should be processed or not.

To be secure by default, the default state is changed to MPLS being
disabled on all interfaces (except the loopback) unless explicitly
enabled and no global option is provided to change the default. Whilst
this differs from other protocols (e.g. IPv6), network operators are
used to explicitly enabling MPLS forwarding on interfaces, and with
the number of links to the MPLS core typically fairly low this doesn't
present too much of a burden on operators.

Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Robert Shearman <rshearma@brocade.com>
---
 Documentation/networking/mpls-sysctl.txt |  9 ++++
 net/mpls/af_mpls.c                       | 75 +++++++++++++++++++++++++++++++-
 net/mpls/internal.h                      |  3 ++
 3 files changed, 85 insertions(+), 2 deletions(-)

diff --git a/Documentation/networking/mpls-sysctl.txt b/Documentation/networking/mpls-sysctl.txt
index 639ddf0ece9b..9ed15f86c17c 100644
--- a/Documentation/networking/mpls-sysctl.txt
+++ b/Documentation/networking/mpls-sysctl.txt
@@ -18,3 +18,12 @@ platform_labels - INTEGER
 
 	Possible values: 0 - 1048575
 	Default: 0
+
+conf/<interface>/input - BOOL
+	Control whether packets can be input on this interface.
+
+	If disabled, packets will be discarded without further
+	processing.
+
+	0 - disabled (default)
+	not 0 - enabled
diff --git a/net/mpls/af_mpls.c b/net/mpls/af_mpls.c
index ad45017eed99..7ac93082e3dc 100644
--- a/net/mpls/af_mpls.c
+++ b/net/mpls/af_mpls.c
@@ -150,7 +150,7 @@ static int mpls_forward(struct sk_buff *skb, struct net_device *dev,
 	/* Careful this entire function runs inside of an rcu critical section */
 
 	mdev = mpls_dev_get(dev);
-	if (!mdev)
+	if (!mdev || !mdev->input_enabled)
 		goto drop;
 
 	if (skb->pkt_type != PACKET_HOST)
@@ -438,6 +438,60 @@ errout:
 	return err;
 }
 
+#define MPLS_PERDEV_SYSCTL_OFFSET(field)	\
+	(&((struct mpls_dev *)0)->field)
+
+static const struct ctl_table mpls_dev_table[] = {
+	{
+		.procname	= "input",
+		.maxlen		= sizeof(int),
+		.mode		= 0644,
+		.proc_handler	= proc_dointvec,
+		.data		= MPLS_PERDEV_SYSCTL_OFFSET(input_enabled),
+	},
+	{ }
+};
+
+static int mpls_dev_sysctl_register(struct net_device *dev,
+				    struct mpls_dev *mdev)
+{
+	char path[sizeof("net/mpls/conf/") + IFNAMSIZ];
+	struct ctl_table *table;
+	int i;
+
+	table = kmemdup(&mpls_dev_table, sizeof(mpls_dev_table), GFP_KERNEL);
+	if (!table)
+		goto out;
+
+	/* Table data contains only offsets relative to the base of
+	 * the mdev at this point, so make them absolute.
+	 */
+	for (i = 0; i < ARRAY_SIZE(mpls_dev_table); i++)
+		table[i].data = (char *)mdev + (uintptr_t)table[i].data;
+
+	snprintf(path, sizeof(path), "net/mpls/conf/%s", dev->name);
+
+	mdev->sysctl = register_net_sysctl(dev_net(dev), path, table);
+	if (!mdev->sysctl)
+		goto free;
+
+	return 0;
+
+free:
+	kfree(table);
+out:
+	return -ENOBUFS;
+}
+
+static void mpls_dev_sysctl_unregister(struct mpls_dev *mdev)
+{
+	struct ctl_table *table;
+
+	table = mdev->sysctl->ctl_table_arg;
+	unregister_net_sysctl_table(mdev->sysctl);
+	kfree(table);
+}
+
 static struct mpls_dev *mpls_add_dev(struct net_device *dev)
 {
 	struct mpls_dev *mdev;
@@ -449,9 +503,24 @@ static struct mpls_dev *mpls_add_dev(struct net_device *dev)
 	if (!mdev)
 		return ERR_PTR(err);
 
+	/* Enable MPLS by default on loopback devices, since this
+	 * doesn't represent a security boundary and is required for the
+	 * lookup of inner labels for LSPs terminating on this router.
+	 */
+	if (dev->flags & IFF_LOOPBACK)
+		mdev->input_enabled = 1;
+
+	err = mpls_dev_sysctl_register(dev, mdev);
+	if (err)
+		goto free;
+
 	rcu_assign_pointer(dev->mpls_ptr, mdev);
 
 	return mdev;
+
+free:
+	kfree(mdev);
+	return ERR_PTR(err);
 }
 
 static void mpls_ifdown(struct net_device *dev)
@@ -475,6 +544,8 @@ static void mpls_ifdown(struct net_device *dev)
 	if (!mdev)
 		return;
 
+	mpls_dev_sysctl_unregister(mdev);
+
 	RCU_INIT_POINTER(dev->mpls_ptr, NULL);
 
 	kfree(mdev);
@@ -958,7 +1029,7 @@ static int mpls_platform_labels(struct ctl_table *table, int write,
 	return ret;
 }
 
-static struct ctl_table mpls_table[] = {
+static const struct ctl_table mpls_table[] = {
 	{
 		.procname	= "platform_labels",
 		.data		= NULL,
diff --git a/net/mpls/internal.h b/net/mpls/internal.h
index 8090cb3099b4..693877d69606 100644
--- a/net/mpls/internal.h
+++ b/net/mpls/internal.h
@@ -23,6 +23,9 @@ struct mpls_entry_decoded {
 };
 
 struct mpls_dev {
+	int			input_enabled;
+
+	struct ctl_table_header *sysctl;
 };
 
 struct sk_buff;
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH 3/3] mpls: Prevent use of implicit NULL label as outgoing label
  2015-04-21 20:34       ` [PATCH 0/3] mpls: ABI changes for security and correctness Robert Shearman
  2015-04-21 20:34         ` [PATCH 1/3] mpls: Per-device MPLS state Robert Shearman
  2015-04-21 20:34         ` [PATCH 2/3] mpls: Per-device enabling of packet input Robert Shearman
@ 2015-04-21 20:34         ` Robert Shearman
  2015-04-22  0:29         ` [PATCH 0/3] mpls: ABI changes for security and correctness Eric W. Biederman
  2015-04-22 10:14         ` [PATCH v2 " Robert Shearman
  4 siblings, 0 replies; 68+ messages in thread
From: Robert Shearman @ 2015-04-21 20:34 UTC (permalink / raw)
  To: davem, ebiederm; +Cc: netdev, Robert Shearman

The reserved implicit-NULL label isn't allowed to appear in the label
stack for packets, so make it an error for the control plane to
specify it as an outgoing label.

Suggested-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Robert Shearman <rshearma@brocade.com>
---
 net/mpls/af_mpls.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/net/mpls/af_mpls.c b/net/mpls/af_mpls.c
index 7ac93082e3dc..eb8dc411859d 100644
--- a/net/mpls/af_mpls.c
+++ b/net/mpls/af_mpls.c
@@ -653,6 +653,15 @@ int nla_get_labels(const struct nlattr *nla,
 		if ((dec.bos != bos) || dec.ttl || dec.tc)
 			return -EINVAL;
 
+		switch (dec.label) {
+		case LABEL_IMPLICIT_NULL:
+			/* RFC3032: This is a label that an LSR may
+			 * assign and distribute, but which never
+			 * actually appears in the encapsulation.
+			 */
+			return -EINVAL;
+		}
+
 		label[i] = dec.label;
 	}
 	*labels = nla_labels;
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 68+ messages in thread

* Re: [PATCH 0/3] mpls: ABI changes for security and correctness
  2015-04-21 20:34       ` [PATCH 0/3] mpls: ABI changes for security and correctness Robert Shearman
                           ` (2 preceding siblings ...)
  2015-04-21 20:34         ` [PATCH 3/3] mpls: Prevent use of implicit NULL label as outgoing label Robert Shearman
@ 2015-04-22  0:29         ` Eric W. Biederman
  2015-04-22  2:12           ` David Miller
  2015-04-22 10:10           ` Robert Shearman
  2015-04-22 10:14         ` [PATCH v2 " Robert Shearman
  4 siblings, 2 replies; 68+ messages in thread
From: Eric W. Biederman @ 2015-04-22  0:29 UTC (permalink / raw)
  To: Robert Shearman; +Cc: davem, netdev

Robert Shearman <rshearma@brocade.com> writes:

> These changes make mpls not be enabled by default on all
> interfaces when in use for security, along with ensuring that a label
> not valid as an outgoing label can be added in mpls routes.
>
> This series contains three ABI/behaviour-affecting changes which have
> been split out from "[PATCH net-next v4 0/6] mpls: Behaviour-changing
> improvements" without any further modification. These changes need to
> be considered for 4.1 otherwise we'll be stuck with the current
> behaviour/ABI forever.

I don't like the difference in default between loopback and everything
else.  That just seems like an extra arbitrary rule.

Otherwise:
Acked-by: "Eric W. Biederman" <ebiederm@xmission.com>

Not that I expect Dave Miller is taking patches during the merge window.

> Robert Shearman (3):
>   mpls: Per-device MPLS state
>   mpls: Per-device enabling of packet input
>   mpls: Prevent use of implicit NULL label as outgoing label
>
>  Documentation/networking/mpls-sysctl.txt |   9 +++
>  include/linux/netdevice.h                |   4 +
>  net/mpls/af_mpls.c                       | 132 ++++++++++++++++++++++++++++++-
>  net/mpls/internal.h                      |   6 ++
>  4 files changed, 148 insertions(+), 3 deletions(-)

Eric

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH 0/3] mpls: ABI changes for security and correctness
  2015-04-22  0:29         ` [PATCH 0/3] mpls: ABI changes for security and correctness Eric W. Biederman
@ 2015-04-22  2:12           ` David Miller
  2015-04-22 10:10           ` Robert Shearman
  1 sibling, 0 replies; 68+ messages in thread
From: David Miller @ 2015-04-22  2:12 UTC (permalink / raw)
  To: ebiederm; +Cc: rshearma, netdev

From: ebiederm@xmission.com (Eric W. Biederman)
Date: Tue, 21 Apr 2015 19:29:42 -0500

> Robert Shearman <rshearma@brocade.com> writes:
> 
>> These changes make mpls not be enabled by default on all
>> interfaces when in use for security, along with ensuring that a label
>> not valid as an outgoing label can be added in mpls routes.
>>
>> This series contains three ABI/behaviour-affecting changes which have
>> been split out from "[PATCH net-next v4 0/6] mpls: Behaviour-changing
>> improvements" without any further modification. These changes need to
>> be considered for 4.1 otherwise we'll be stuck with the current
>> behaviour/ABI forever.
> 
> I don't like the difference in default between loopback and everything
> else.  That just seems like an extra arbitrary rule.
> 
> Otherwise:
> Acked-by: "Eric W. Biederman" <ebiederm@xmission.com>
> 
> Not that I expect Dave Miller is taking patches during the merge window.

Eric, you say you disagree with the loopback vs. everything else
behavior, yet you're ACK'ing this.

Please don't say something like that because it is confusing and
I can't tell what you want me to do.

If you're willing to accept the series as is, say is: "Even though
I disagree with X, I'm ok with this series for now."

If you want changes before the series gets applied: "I want X
changed to Y, and with that I give my ACK."

Thanks.

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH 0/3] mpls: ABI changes for security and correctness
  2015-04-22  0:29         ` [PATCH 0/3] mpls: ABI changes for security and correctness Eric W. Biederman
  2015-04-22  2:12           ` David Miller
@ 2015-04-22 10:10           ` Robert Shearman
  1 sibling, 0 replies; 68+ messages in thread
From: Robert Shearman @ 2015-04-22 10:10 UTC (permalink / raw)
  To: Eric W. Biederman; +Cc: davem, netdev

On 22/04/15 01:29, Eric W. Biederman wrote:
> Robert Shearman <rshearma@brocade.com> writes:
>
>> These changes make mpls not be enabled by default on all
>> interfaces when in use for security, along with ensuring that a label
>> not valid as an outgoing label can be added in mpls routes.
>>
>> This series contains three ABI/behaviour-affecting changes which have
>> been split out from "[PATCH net-next v4 0/6] mpls: Behaviour-changing
>> improvements" without any further modification. These changes need to
>> be considered for 4.1 otherwise we'll be stuck with the current
>> behaviour/ABI forever.
>
> I don't like the difference in default between loopback and everything
> else.  That just seems like an extra arbitrary rule.

Ok, I'll get rid of that.

>
> Otherwise:
> Acked-by: "Eric W. Biederman" <ebiederm@xmission.com>
>
> Not that I expect Dave Miller is taking patches during the merge window.
>
>> Robert Shearman (3):
>>    mpls: Per-device MPLS state
>>    mpls: Per-device enabling of packet input
>>    mpls: Prevent use of implicit NULL label as outgoing label
>>
>>   Documentation/networking/mpls-sysctl.txt |   9 +++
>>   include/linux/netdevice.h                |   4 +
>>   net/mpls/af_mpls.c                       | 132 ++++++++++++++++++++++++++++++-
>>   net/mpls/internal.h                      |   6 ++
>>   4 files changed, 148 insertions(+), 3 deletions(-)
>
> Eric
>

^ permalink raw reply	[flat|nested] 68+ messages in thread

* [PATCH v2 0/3] mpls: ABI changes for security and correctness
  2015-04-21 20:34       ` [PATCH 0/3] mpls: ABI changes for security and correctness Robert Shearman
                           ` (3 preceding siblings ...)
  2015-04-22  0:29         ` [PATCH 0/3] mpls: ABI changes for security and correctness Eric W. Biederman
@ 2015-04-22 10:14         ` Robert Shearman
  2015-04-22 10:14           ` [PATCH v2 1/3] mpls: Per-device MPLS state Robert Shearman
                             ` (3 more replies)
  4 siblings, 4 replies; 68+ messages in thread
From: Robert Shearman @ 2015-04-22 10:14 UTC (permalink / raw)
  To: davem, ebiederm; +Cc: netdev, Robert Shearman

V2:
 - don't treat loopback interfaces specially by enabling mpls by
   default

These changes make mpls not be enabled by default on all
interfaces when in use for security, along with ensuring that a label
not valid as an outgoing label can be added in mpls routes.

This series contains three ABI/behaviour-affecting changes which have
been split out from "[PATCH net-next v4 0/6] mpls: Behaviour-changing
improvements" without any further modification. These changes need to
be considered for 4.1 otherwise we'll be stuck with the current
behaviour/ABI forever.

Robert Shearman (3):
  mpls: Per-device MPLS state
  mpls: Per-device enabling of packet input
  mpls: Prevent use of implicit NULL label as outgoing label

 Documentation/networking/mpls-sysctl.txt |   9 +++
 include/linux/netdevice.h                |   4 +
 net/mpls/af_mpls.c                       | 125 ++++++++++++++++++++++++++++++-
 net/mpls/internal.h                      |   6 ++
 4 files changed, 141 insertions(+), 3 deletions(-)

-- 
2.1.4

^ permalink raw reply	[flat|nested] 68+ messages in thread

* [PATCH v2 1/3] mpls: Per-device MPLS state
  2015-04-22 10:14         ` [PATCH v2 " Robert Shearman
@ 2015-04-22 10:14           ` Robert Shearman
  2015-04-22 15:25             ` Eric W. Biederman
  2015-04-22 10:14           ` [PATCH v2 2/3] mpls: Per-device enabling of packet input Robert Shearman
                             ` (2 subsequent siblings)
  3 siblings, 1 reply; 68+ messages in thread
From: Robert Shearman @ 2015-04-22 10:14 UTC (permalink / raw)
  To: davem, ebiederm; +Cc: netdev, Robert Shearman

Add per-device MPLS state to supported interfaces. Use the presence of
this state in mpls_route_add to determine that this is a supported
interface.

Use the presence of mpls_dev to drop packets that arrived on an
unsupported interface - previously they were allowed through.

Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Robert Shearman <rshearma@brocade.com>
---
 include/linux/netdevice.h |  4 ++++
 net/mpls/af_mpls.c        | 50 +++++++++++++++++++++++++++++++++++++++++++++--
 net/mpls/internal.h       |  3 +++
 3 files changed, 55 insertions(+), 2 deletions(-)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index bcbde799ec69..dae106a3a998 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -60,6 +60,7 @@ struct phy_device;
 struct wireless_dev;
 /* 802.15.4 specific */
 struct wpan_dev;
+struct mpls_dev;
 
 void netdev_set_default_ethtool_ops(struct net_device *dev,
 				    const struct ethtool_ops *ops);
@@ -1627,6 +1628,9 @@ struct net_device {
 	void			*ax25_ptr;
 	struct wireless_dev	*ieee80211_ptr;
 	struct wpan_dev		*ieee802154_ptr;
+#if IS_ENABLED(CONFIG_MPLS_ROUTING)
+	struct mpls_dev __rcu	*mpls_ptr;
+#endif
 
 /*
  * Cache lines mostly used on receive path (including eth_type_trans())
diff --git a/net/mpls/af_mpls.c b/net/mpls/af_mpls.c
index db8a2ea6d4de..ad45017eed99 100644
--- a/net/mpls/af_mpls.c
+++ b/net/mpls/af_mpls.c
@@ -53,6 +53,11 @@ static struct mpls_route *mpls_route_input_rcu(struct net *net, unsigned index)
 	return rt;
 }
 
+static inline struct mpls_dev *mpls_dev_get(const struct net_device *dev)
+{
+	return rcu_dereference_rtnl(dev->mpls_ptr);
+}
+
 static bool mpls_output_possible(const struct net_device *dev)
 {
 	return dev && (dev->flags & IFF_UP) && netif_carrier_ok(dev);
@@ -136,6 +141,7 @@ static int mpls_forward(struct sk_buff *skb, struct net_device *dev,
 	struct mpls_route *rt;
 	struct mpls_entry_decoded dec;
 	struct net_device *out_dev;
+	struct mpls_dev *mdev;
 	unsigned int hh_len;
 	unsigned int new_header_size;
 	unsigned int mtu;
@@ -143,6 +149,10 @@ static int mpls_forward(struct sk_buff *skb, struct net_device *dev,
 
 	/* Careful this entire function runs inside of an rcu critical section */
 
+	mdev = mpls_dev_get(dev);
+	if (!mdev)
+		goto drop;
+
 	if (skb->pkt_type != PACKET_HOST)
 		goto drop;
 
@@ -352,9 +362,9 @@ static int mpls_route_add(struct mpls_route_config *cfg)
 	if (!dev)
 		goto errout;
 
-	/* For now just support ethernet devices */
+	/* Ensure this is a supported device */
 	err = -EINVAL;
-	if ((dev->type != ARPHRD_ETHER) && (dev->type != ARPHRD_LOOPBACK))
+	if (!mpls_dev_get(dev))
 		goto errout;
 
 	err = -EINVAL;
@@ -428,10 +438,27 @@ errout:
 	return err;
 }
 
+static struct mpls_dev *mpls_add_dev(struct net_device *dev)
+{
+	struct mpls_dev *mdev;
+	int err = -ENOMEM;
+
+	ASSERT_RTNL();
+
+	mdev = kzalloc(sizeof(*mdev), GFP_KERNEL);
+	if (!mdev)
+		return ERR_PTR(err);
+
+	rcu_assign_pointer(dev->mpls_ptr, mdev);
+
+	return mdev;
+}
+
 static void mpls_ifdown(struct net_device *dev)
 {
 	struct mpls_route __rcu **platform_label;
 	struct net *net = dev_net(dev);
+	struct mpls_dev *mdev;
 	unsigned index;
 
 	platform_label = rtnl_dereference(net->mpls.platform_label);
@@ -443,14 +470,33 @@ static void mpls_ifdown(struct net_device *dev)
 			continue;
 		rt->rt_dev = NULL;
 	}
+
+	mdev = mpls_dev_get(dev);
+	if (!mdev)
+		return;
+
+	RCU_INIT_POINTER(dev->mpls_ptr, NULL);
+
+	kfree(mdev);
 }
 
 static int mpls_dev_notify(struct notifier_block *this, unsigned long event,
 			   void *ptr)
 {
 	struct net_device *dev = netdev_notifier_info_to_dev(ptr);
+	struct mpls_dev *mdev;
 
 	switch(event) {
+	case NETDEV_REGISTER:
+		/* For now just support ethernet devices */
+		if ((dev->type == ARPHRD_ETHER) ||
+		    (dev->type == ARPHRD_LOOPBACK)) {
+			mdev = mpls_add_dev(dev);
+			if (IS_ERR(mdev))
+				return notifier_from_errno(PTR_ERR(mdev));
+		}
+		break;
+
 	case NETDEV_UNREGISTER:
 		mpls_ifdown(dev);
 		break;
diff --git a/net/mpls/internal.h b/net/mpls/internal.h
index fb6de92052c4..8090cb3099b4 100644
--- a/net/mpls/internal.h
+++ b/net/mpls/internal.h
@@ -22,6 +22,9 @@ struct mpls_entry_decoded {
 	u8 bos;
 };
 
+struct mpls_dev {
+};
+
 struct sk_buff;
 
 static inline struct mpls_shim_hdr *mpls_hdr(const struct sk_buff *skb)
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH v2 2/3] mpls: Per-device enabling of packet input
  2015-04-22 10:14         ` [PATCH v2 " Robert Shearman
  2015-04-22 10:14           ` [PATCH v2 1/3] mpls: Per-device MPLS state Robert Shearman
@ 2015-04-22 10:14           ` Robert Shearman
  2015-04-22 16:27             ` Eric W. Biederman
  2015-04-22 10:14           ` [PATCH v2 3/3] mpls: Prevent use of implicit NULL label as outgoing label Robert Shearman
  2015-04-22 16:47           ` [PATCH v2 0/3] mpls: ABI changes for security and correctness Eric W. Biederman
  3 siblings, 1 reply; 68+ messages in thread
From: Robert Shearman @ 2015-04-22 10:14 UTC (permalink / raw)
  To: davem, ebiederm; +Cc: netdev, Robert Shearman

An MPLS network is a single trust domain where the edges must be in
control of what labels make their way into the core. The simplest way
of ensuring this is for the edge device to always impose the labels,
and not allow forward labeled traffic from untrusted neighbours. This
is achieved by allowing a per-device configuration of whether MPLS
traffic input from that interface should be processed or not.

To be secure by default, the default state is changed to MPLS being
disabled on all interfaces unless explicitly enabled and no global
option is provided to change the default. Whilst this differs from
other protocols (e.g. IPv6), network operators are used to explicitly
enabling MPLS forwarding on interfaces, and with the number of links
to the MPLS core typically fairly low this doesn't present too much of
a burden on operators.

Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Robert Shearman <rshearma@brocade.com>
---
 Documentation/networking/mpls-sysctl.txt |  9 +++++
 net/mpls/af_mpls.c                       | 68 +++++++++++++++++++++++++++++++-
 net/mpls/internal.h                      |  3 ++
 3 files changed, 78 insertions(+), 2 deletions(-)

diff --git a/Documentation/networking/mpls-sysctl.txt b/Documentation/networking/mpls-sysctl.txt
index 639ddf0ece9b..9ed15f86c17c 100644
--- a/Documentation/networking/mpls-sysctl.txt
+++ b/Documentation/networking/mpls-sysctl.txt
@@ -18,3 +18,12 @@ platform_labels - INTEGER
 
 	Possible values: 0 - 1048575
 	Default: 0
+
+conf/<interface>/input - BOOL
+	Control whether packets can be input on this interface.
+
+	If disabled, packets will be discarded without further
+	processing.
+
+	0 - disabled (default)
+	not 0 - enabled
diff --git a/net/mpls/af_mpls.c b/net/mpls/af_mpls.c
index ad45017eed99..9fdd94cba83e 100644
--- a/net/mpls/af_mpls.c
+++ b/net/mpls/af_mpls.c
@@ -150,7 +150,7 @@ static int mpls_forward(struct sk_buff *skb, struct net_device *dev,
 	/* Careful this entire function runs inside of an rcu critical section */
 
 	mdev = mpls_dev_get(dev);
-	if (!mdev)
+	if (!mdev || !mdev->input_enabled)
 		goto drop;
 
 	if (skb->pkt_type != PACKET_HOST)
@@ -438,6 +438,60 @@ errout:
 	return err;
 }
 
+#define MPLS_PERDEV_SYSCTL_OFFSET(field)	\
+	(&((struct mpls_dev *)0)->field)
+
+static const struct ctl_table mpls_dev_table[] = {
+	{
+		.procname	= "input",
+		.maxlen		= sizeof(int),
+		.mode		= 0644,
+		.proc_handler	= proc_dointvec,
+		.data		= MPLS_PERDEV_SYSCTL_OFFSET(input_enabled),
+	},
+	{ }
+};
+
+static int mpls_dev_sysctl_register(struct net_device *dev,
+				    struct mpls_dev *mdev)
+{
+	char path[sizeof("net/mpls/conf/") + IFNAMSIZ];
+	struct ctl_table *table;
+	int i;
+
+	table = kmemdup(&mpls_dev_table, sizeof(mpls_dev_table), GFP_KERNEL);
+	if (!table)
+		goto out;
+
+	/* Table data contains only offsets relative to the base of
+	 * the mdev at this point, so make them absolute.
+	 */
+	for (i = 0; i < ARRAY_SIZE(mpls_dev_table); i++)
+		table[i].data = (char *)mdev + (uintptr_t)table[i].data;
+
+	snprintf(path, sizeof(path), "net/mpls/conf/%s", dev->name);
+
+	mdev->sysctl = register_net_sysctl(dev_net(dev), path, table);
+	if (!mdev->sysctl)
+		goto free;
+
+	return 0;
+
+free:
+	kfree(table);
+out:
+	return -ENOBUFS;
+}
+
+static void mpls_dev_sysctl_unregister(struct mpls_dev *mdev)
+{
+	struct ctl_table *table;
+
+	table = mdev->sysctl->ctl_table_arg;
+	unregister_net_sysctl_table(mdev->sysctl);
+	kfree(table);
+}
+
 static struct mpls_dev *mpls_add_dev(struct net_device *dev)
 {
 	struct mpls_dev *mdev;
@@ -449,9 +503,17 @@ static struct mpls_dev *mpls_add_dev(struct net_device *dev)
 	if (!mdev)
 		return ERR_PTR(err);
 
+	err = mpls_dev_sysctl_register(dev, mdev);
+	if (err)
+		goto free;
+
 	rcu_assign_pointer(dev->mpls_ptr, mdev);
 
 	return mdev;
+
+free:
+	kfree(mdev);
+	return ERR_PTR(err);
 }
 
 static void mpls_ifdown(struct net_device *dev)
@@ -475,6 +537,8 @@ static void mpls_ifdown(struct net_device *dev)
 	if (!mdev)
 		return;
 
+	mpls_dev_sysctl_unregister(mdev);
+
 	RCU_INIT_POINTER(dev->mpls_ptr, NULL);
 
 	kfree(mdev);
@@ -958,7 +1022,7 @@ static int mpls_platform_labels(struct ctl_table *table, int write,
 	return ret;
 }
 
-static struct ctl_table mpls_table[] = {
+static const struct ctl_table mpls_table[] = {
 	{
 		.procname	= "platform_labels",
 		.data		= NULL,
diff --git a/net/mpls/internal.h b/net/mpls/internal.h
index 8090cb3099b4..693877d69606 100644
--- a/net/mpls/internal.h
+++ b/net/mpls/internal.h
@@ -23,6 +23,9 @@ struct mpls_entry_decoded {
 };
 
 struct mpls_dev {
+	int			input_enabled;
+
+	struct ctl_table_header *sysctl;
 };
 
 struct sk_buff;
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH v2 3/3] mpls: Prevent use of implicit NULL label as outgoing label
  2015-04-22 10:14         ` [PATCH v2 " Robert Shearman
  2015-04-22 10:14           ` [PATCH v2 1/3] mpls: Per-device MPLS state Robert Shearman
  2015-04-22 10:14           ` [PATCH v2 2/3] mpls: Per-device enabling of packet input Robert Shearman
@ 2015-04-22 10:14           ` Robert Shearman
  2015-04-22 16:32             ` Eric W. Biederman
  2015-04-22 16:47           ` [PATCH v2 0/3] mpls: ABI changes for security and correctness Eric W. Biederman
  3 siblings, 1 reply; 68+ messages in thread
From: Robert Shearman @ 2015-04-22 10:14 UTC (permalink / raw)
  To: davem, ebiederm; +Cc: netdev, Robert Shearman

The reserved implicit-NULL label isn't allowed to appear in the label
stack for packets, so make it an error for the control plane to
specify it as an outgoing label.

Suggested-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Robert Shearman <rshearma@brocade.com>
---
 net/mpls/af_mpls.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/net/mpls/af_mpls.c b/net/mpls/af_mpls.c
index 9fdd94cba83e..954810c76a86 100644
--- a/net/mpls/af_mpls.c
+++ b/net/mpls/af_mpls.c
@@ -646,6 +646,15 @@ int nla_get_labels(const struct nlattr *nla,
 		if ((dec.bos != bos) || dec.ttl || dec.tc)
 			return -EINVAL;
 
+		switch (dec.label) {
+		case LABEL_IMPLICIT_NULL:
+			/* RFC3032: This is a label that an LSR may
+			 * assign and distribute, but which never
+			 * actually appears in the encapsulation.
+			 */
+			return -EINVAL;
+		}
+
 		label[i] = dec.label;
 	}
 	*labels = nla_labels;
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 68+ messages in thread

* Re: [PATCH v2 1/3] mpls: Per-device MPLS state
  2015-04-22 10:14           ` [PATCH v2 1/3] mpls: Per-device MPLS state Robert Shearman
@ 2015-04-22 15:25             ` Eric W. Biederman
  0 siblings, 0 replies; 68+ messages in thread
From: Eric W. Biederman @ 2015-04-22 15:25 UTC (permalink / raw)
  To: Robert Shearman; +Cc: davem, netdev

Robert Shearman <rshearma@brocade.com> writes:

> Add per-device MPLS state to supported interfaces. Use the presence of
> this state in mpls_route_add to determine that this is a supported
> interface.
>
> Use the presence of mpls_dev to drop packets that arrived on an
> unsupported interface - previously they were allowed through.

Reviewed-by: "Eric W. Biederman" <ebiederm@xmission.com>

>
> Cc: "Eric W. Biederman" <ebiederm@xmission.com>
> Signed-off-by: Robert Shearman <rshearma@brocade.com>
> ---
>  include/linux/netdevice.h |  4 ++++
>  net/mpls/af_mpls.c        | 50 +++++++++++++++++++++++++++++++++++++++++++++--
>  net/mpls/internal.h       |  3 +++
>  3 files changed, 55 insertions(+), 2 deletions(-)
>
> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
> index bcbde799ec69..dae106a3a998 100644
> --- a/include/linux/netdevice.h
> +++ b/include/linux/netdevice.h
> @@ -60,6 +60,7 @@ struct phy_device;
>  struct wireless_dev;
>  /* 802.15.4 specific */
>  struct wpan_dev;
> +struct mpls_dev;
>  
>  void netdev_set_default_ethtool_ops(struct net_device *dev,
>  				    const struct ethtool_ops *ops);
> @@ -1627,6 +1628,9 @@ struct net_device {
>  	void			*ax25_ptr;
>  	struct wireless_dev	*ieee80211_ptr;
>  	struct wpan_dev		*ieee802154_ptr;
> +#if IS_ENABLED(CONFIG_MPLS_ROUTING)
> +	struct mpls_dev __rcu	*mpls_ptr;
> +#endif
>  
>  /*
>   * Cache lines mostly used on receive path (including eth_type_trans())
> diff --git a/net/mpls/af_mpls.c b/net/mpls/af_mpls.c
> index db8a2ea6d4de..ad45017eed99 100644
> --- a/net/mpls/af_mpls.c
> +++ b/net/mpls/af_mpls.c
> @@ -53,6 +53,11 @@ static struct mpls_route *mpls_route_input_rcu(struct net *net, unsigned index)
>  	return rt;
>  }
>  
> +static inline struct mpls_dev *mpls_dev_get(const struct net_device *dev)
> +{
> +	return rcu_dereference_rtnl(dev->mpls_ptr);
> +}
> +
>  static bool mpls_output_possible(const struct net_device *dev)
>  {
>  	return dev && (dev->flags & IFF_UP) && netif_carrier_ok(dev);
> @@ -136,6 +141,7 @@ static int mpls_forward(struct sk_buff *skb, struct net_device *dev,
>  	struct mpls_route *rt;
>  	struct mpls_entry_decoded dec;
>  	struct net_device *out_dev;
> +	struct mpls_dev *mdev;
>  	unsigned int hh_len;
>  	unsigned int new_header_size;
>  	unsigned int mtu;
> @@ -143,6 +149,10 @@ static int mpls_forward(struct sk_buff *skb, struct net_device *dev,
>  
>  	/* Careful this entire function runs inside of an rcu critical section */
>  
> +	mdev = mpls_dev_get(dev);
> +	if (!mdev)
> +		goto drop;
> +
>  	if (skb->pkt_type != PACKET_HOST)
>  		goto drop;
>  
> @@ -352,9 +362,9 @@ static int mpls_route_add(struct mpls_route_config *cfg)
>  	if (!dev)
>  		goto errout;
>  
> -	/* For now just support ethernet devices */
> +	/* Ensure this is a supported device */
>  	err = -EINVAL;
> -	if ((dev->type != ARPHRD_ETHER) && (dev->type != ARPHRD_LOOPBACK))
> +	if (!mpls_dev_get(dev))
>  		goto errout;
>  
>  	err = -EINVAL;
> @@ -428,10 +438,27 @@ errout:
>  	return err;
>  }
>  
> +static struct mpls_dev *mpls_add_dev(struct net_device *dev)
> +{
> +	struct mpls_dev *mdev;
> +	int err = -ENOMEM;
> +
> +	ASSERT_RTNL();
> +
> +	mdev = kzalloc(sizeof(*mdev), GFP_KERNEL);
> +	if (!mdev)
> +		return ERR_PTR(err);
> +
> +	rcu_assign_pointer(dev->mpls_ptr, mdev);
> +
> +	return mdev;
> +}
> +
>  static void mpls_ifdown(struct net_device *dev)
>  {
>  	struct mpls_route __rcu **platform_label;
>  	struct net *net = dev_net(dev);
> +	struct mpls_dev *mdev;
>  	unsigned index;
>  
>  	platform_label = rtnl_dereference(net->mpls.platform_label);
> @@ -443,14 +470,33 @@ static void mpls_ifdown(struct net_device *dev)
>  			continue;
>  		rt->rt_dev = NULL;
>  	}
> +
> +	mdev = mpls_dev_get(dev);
> +	if (!mdev)
> +		return;
> +
> +	RCU_INIT_POINTER(dev->mpls_ptr, NULL);
> +
> +	kfree(mdev);
>  }
>  
>  static int mpls_dev_notify(struct notifier_block *this, unsigned long event,
>  			   void *ptr)
>  {
>  	struct net_device *dev = netdev_notifier_info_to_dev(ptr);
> +	struct mpls_dev *mdev;
>  
>  	switch(event) {
> +	case NETDEV_REGISTER:
> +		/* For now just support ethernet devices */
> +		if ((dev->type == ARPHRD_ETHER) ||
> +		    (dev->type == ARPHRD_LOOPBACK)) {
> +			mdev = mpls_add_dev(dev);
> +			if (IS_ERR(mdev))
> +				return notifier_from_errno(PTR_ERR(mdev));
> +		}
> +		break;
> +
>  	case NETDEV_UNREGISTER:
>  		mpls_ifdown(dev);
>  		break;
> diff --git a/net/mpls/internal.h b/net/mpls/internal.h
> index fb6de92052c4..8090cb3099b4 100644
> --- a/net/mpls/internal.h
> +++ b/net/mpls/internal.h
> @@ -22,6 +22,9 @@ struct mpls_entry_decoded {
>  	u8 bos;
>  };
>  
> +struct mpls_dev {
> +};
> +
>  struct sk_buff;
>  
>  static inline struct mpls_shim_hdr *mpls_hdr(const struct sk_buff *skb)

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v2 2/3] mpls: Per-device enabling of packet input
  2015-04-22 10:14           ` [PATCH v2 2/3] mpls: Per-device enabling of packet input Robert Shearman
@ 2015-04-22 16:27             ` Eric W. Biederman
  0 siblings, 0 replies; 68+ messages in thread
From: Eric W. Biederman @ 2015-04-22 16:27 UTC (permalink / raw)
  To: Robert Shearman; +Cc: davem, netdev

Robert Shearman <rshearma@brocade.com> writes:

> An MPLS network is a single trust domain where the edges must be in
> control of what labels make their way into the core. The simplest way
> of ensuring this is for the edge device to always impose the labels,
> and not allow forward labeled traffic from untrusted neighbours. This
> is achieved by allowing a per-device configuration of whether MPLS
> traffic input from that interface should be processed or not.
>
> To be secure by default, the default state is changed to MPLS being
> disabled on all interfaces unless explicitly enabled and no global
> option is provided to change the default. Whilst this differs from
> other protocols (e.g. IPv6), network operators are used to explicitly
> enabling MPLS forwarding on interfaces, and with the number of links
> to the MPLS core typically fairly low this doesn't present too much of
> a burden on operators.

Reviewed-by: "Eric W. Biederman" <ebiederm@xmission.com>

> Cc: "Eric W. Biederman" <ebiederm@xmission.com>
> Signed-off-by: Robert Shearman <rshearma@brocade.com>
> ---
>  Documentation/networking/mpls-sysctl.txt |  9 +++++
>  net/mpls/af_mpls.c                       | 68 +++++++++++++++++++++++++++++++-
>  net/mpls/internal.h                      |  3 ++
>  3 files changed, 78 insertions(+), 2 deletions(-)
>
> diff --git a/Documentation/networking/mpls-sysctl.txt b/Documentation/networking/mpls-sysctl.txt
> index 639ddf0ece9b..9ed15f86c17c 100644
> --- a/Documentation/networking/mpls-sysctl.txt
> +++ b/Documentation/networking/mpls-sysctl.txt
> @@ -18,3 +18,12 @@ platform_labels - INTEGER
>  
>  	Possible values: 0 - 1048575
>  	Default: 0
> +
> +conf/<interface>/input - BOOL
> +	Control whether packets can be input on this interface.
> +
> +	If disabled, packets will be discarded without further
> +	processing.
> +
> +	0 - disabled (default)
> +	not 0 - enabled
> diff --git a/net/mpls/af_mpls.c b/net/mpls/af_mpls.c
> index ad45017eed99..9fdd94cba83e 100644
> --- a/net/mpls/af_mpls.c
> +++ b/net/mpls/af_mpls.c
> @@ -150,7 +150,7 @@ static int mpls_forward(struct sk_buff *skb, struct net_device *dev,
>  	/* Careful this entire function runs inside of an rcu critical section */
>  
>  	mdev = mpls_dev_get(dev);
> -	if (!mdev)
> +	if (!mdev || !mdev->input_enabled)
>  		goto drop;
>  
>  	if (skb->pkt_type != PACKET_HOST)
> @@ -438,6 +438,60 @@ errout:
>  	return err;
>  }
>  
> +#define MPLS_PERDEV_SYSCTL_OFFSET(field)	\
> +	(&((struct mpls_dev *)0)->field)
> +
> +static const struct ctl_table mpls_dev_table[] = {
> +	{
> +		.procname	= "input",
> +		.maxlen		= sizeof(int),
> +		.mode		= 0644,
> +		.proc_handler	= proc_dointvec,
> +		.data		= MPLS_PERDEV_SYSCTL_OFFSET(input_enabled),
> +	},
> +	{ }
> +};
> +
> +static int mpls_dev_sysctl_register(struct net_device *dev,
> +				    struct mpls_dev *mdev)
> +{
> +	char path[sizeof("net/mpls/conf/") + IFNAMSIZ];
> +	struct ctl_table *table;
> +	int i;
> +
> +	table = kmemdup(&mpls_dev_table, sizeof(mpls_dev_table), GFP_KERNEL);
> +	if (!table)
> +		goto out;
> +
> +	/* Table data contains only offsets relative to the base of
> +	 * the mdev at this point, so make them absolute.
> +	 */
> +	for (i = 0; i < ARRAY_SIZE(mpls_dev_table); i++)
> +		table[i].data = (char *)mdev + (uintptr_t)table[i].data;
> +
> +	snprintf(path, sizeof(path), "net/mpls/conf/%s", dev->name);
> +
> +	mdev->sysctl = register_net_sysctl(dev_net(dev), path, table);
> +	if (!mdev->sysctl)
> +		goto free;
> +
> +	return 0;
> +
> +free:
> +	kfree(table);
> +out:
> +	return -ENOBUFS;
> +}
> +
> +static void mpls_dev_sysctl_unregister(struct mpls_dev *mdev)
> +{
> +	struct ctl_table *table;
> +
> +	table = mdev->sysctl->ctl_table_arg;
> +	unregister_net_sysctl_table(mdev->sysctl);
> +	kfree(table);
> +}
> +
>  static struct mpls_dev *mpls_add_dev(struct net_device *dev)
>  {
>  	struct mpls_dev *mdev;
> @@ -449,9 +503,17 @@ static struct mpls_dev *mpls_add_dev(struct net_device *dev)
>  	if (!mdev)
>  		return ERR_PTR(err);
>  
> +	err = mpls_dev_sysctl_register(dev, mdev);
> +	if (err)
> +		goto free;
> +
>  	rcu_assign_pointer(dev->mpls_ptr, mdev);
>  
>  	return mdev;
> +
> +free:
> +	kfree(mdev);
> +	return ERR_PTR(err);
>  }
>  
>  static void mpls_ifdown(struct net_device *dev)
> @@ -475,6 +537,8 @@ static void mpls_ifdown(struct net_device *dev)
>  	if (!mdev)
>  		return;
>  
> +	mpls_dev_sysctl_unregister(mdev);
> +
>  	RCU_INIT_POINTER(dev->mpls_ptr, NULL);
>  
>  	kfree(mdev);
> @@ -958,7 +1022,7 @@ static int mpls_platform_labels(struct ctl_table *table, int write,
>  	return ret;
>  }
>  
> -static struct ctl_table mpls_table[] = {
> +static const struct ctl_table mpls_table[] = {
>  	{
>  		.procname	= "platform_labels",
>  		.data		= NULL,
> diff --git a/net/mpls/internal.h b/net/mpls/internal.h
> index 8090cb3099b4..693877d69606 100644
> --- a/net/mpls/internal.h
> +++ b/net/mpls/internal.h
> @@ -23,6 +23,9 @@ struct mpls_entry_decoded {
>  };
>  
>  struct mpls_dev {
> +	int			input_enabled;
> +
> +	struct ctl_table_header *sysctl;
>  };
>  
>  struct sk_buff;

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v2 3/3] mpls: Prevent use of implicit NULL label as outgoing label
  2015-04-22 10:14           ` [PATCH v2 3/3] mpls: Prevent use of implicit NULL label as outgoing label Robert Shearman
@ 2015-04-22 16:32             ` Eric W. Biederman
  0 siblings, 0 replies; 68+ messages in thread
From: Eric W. Biederman @ 2015-04-22 16:32 UTC (permalink / raw)
  To: Robert Shearman; +Cc: davem, netdev

Robert Shearman <rshearma@brocade.com> writes:

> The reserved implicit-NULL label isn't allowed to appear in the label
> stack for packets, so make it an error for the control plane to
> specify it as an outgoing label.

Reviewed-by: "Eric W. Biederman" <ebiederm@xmission.com>

>
> Suggested-by: "Eric W. Biederman" <ebiederm@xmission.com>
> Signed-off-by: Robert Shearman <rshearma@brocade.com>
> ---
>  net/mpls/af_mpls.c | 9 +++++++++
>  1 file changed, 9 insertions(+)
>
> diff --git a/net/mpls/af_mpls.c b/net/mpls/af_mpls.c
> index 9fdd94cba83e..954810c76a86 100644
> --- a/net/mpls/af_mpls.c
> +++ b/net/mpls/af_mpls.c
> @@ -646,6 +646,15 @@ int nla_get_labels(const struct nlattr *nla,
>  		if ((dec.bos != bos) || dec.ttl || dec.tc)
>  			return -EINVAL;
>  
> +		switch (dec.label) {
> +		case LABEL_IMPLICIT_NULL:
> +			/* RFC3032: This is a label that an LSR may
> +			 * assign and distribute, but which never
> +			 * actually appears in the encapsulation.
> +			 */
> +			return -EINVAL;
> +		}
> +
>  		label[i] = dec.label;
>  	}
>  	*labels = nla_labels;

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v2 0/3] mpls: ABI changes for security and correctness
  2015-04-22 10:14         ` [PATCH v2 " Robert Shearman
                             ` (2 preceding siblings ...)
  2015-04-22 10:14           ` [PATCH v2 3/3] mpls: Prevent use of implicit NULL label as outgoing label Robert Shearman
@ 2015-04-22 16:47           ` Eric W. Biederman
  2015-04-22 18:25             ` David Miller
  3 siblings, 1 reply; 68+ messages in thread
From: Eric W. Biederman @ 2015-04-22 16:47 UTC (permalink / raw)
  To: Robert Shearman; +Cc: davem, netdev

Robert Shearman <rshearma@brocade.com> writes:

> V2:
>  - don't treat loopback interfaces specially by enabling mpls by
>    default
>
> These changes make mpls not be enabled by default on all
> interfaces when in use for security, along with ensuring that a label
> not valid as an outgoing label can be added in mpls routes.
>
> This series contains three ABI/behaviour-affecting changes which have
> been split out from "[PATCH net-next v4 0/6] mpls: Behaviour-changing
> improvements" without any further modification. These changes need to
> be considered for 4.1 otherwise we'll be stuck with the current
> behaviour/ABI forever.

Thanks.

Reviewed-by: "Eric W. Biederman" <ebiederm@xmission.com>

These patches all look good.  If these patches can make it into 4.1 so
we don't have a stable release without the new net/mpls/conf/<dev>/input
sysctl that would minimize disruption to users of the mpls code.

Eric

> Robert Shearman (3):
>   mpls: Per-device MPLS state
>   mpls: Per-device enabling of packet input
>   mpls: Prevent use of implicit NULL label as outgoing label
>
>  Documentation/networking/mpls-sysctl.txt |   9 +++
>  include/linux/netdevice.h                |   4 +
>  net/mpls/af_mpls.c                       | 125 ++++++++++++++++++++++++++++++-
>  net/mpls/internal.h                      |   6 ++
>  4 files changed, 141 insertions(+), 3 deletions(-)

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v2 0/3] mpls: ABI changes for security and correctness
  2015-04-22 16:47           ` [PATCH v2 0/3] mpls: ABI changes for security and correctness Eric W. Biederman
@ 2015-04-22 18:25             ` David Miller
  0 siblings, 0 replies; 68+ messages in thread
From: David Miller @ 2015-04-22 18:25 UTC (permalink / raw)
  To: ebiederm; +Cc: rshearma, netdev

From: ebiederm@xmission.com (Eric W. Biederman)
Date: Wed, 22 Apr 2015 11:47:10 -0500

> Robert Shearman <rshearma@brocade.com> writes:
> 
>> V2:
>>  - don't treat loopback interfaces specially by enabling mpls by
>>    default
>>
>> These changes make mpls not be enabled by default on all
>> interfaces when in use for security, along with ensuring that a label
>> not valid as an outgoing label can be added in mpls routes.
>>
>> This series contains three ABI/behaviour-affecting changes which have
>> been split out from "[PATCH net-next v4 0/6] mpls: Behaviour-changing
>> improvements" without any further modification. These changes need to
>> be considered for 4.1 otherwise we'll be stuck with the current
>> behaviour/ABI forever.
> 
> Thanks.
> 
> Reviewed-by: "Eric W. Biederman" <ebiederm@xmission.com>
> 
> These patches all look good.  If these patches can make it into 4.1 so
> we don't have a stable release without the new net/mpls/conf/<dev>/input
> sysctl that would minimize disruption to users of the mpls code.

Series applied, thanks everyone.

^ permalink raw reply	[flat|nested] 68+ messages in thread

end of thread, other threads:[~2015-04-22 18:25 UTC | newest]

Thread overview: 68+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-03-19 21:32 [PATCH net-next 0/5] mpls: Behaviour-changing improvements Robert Shearman
2015-03-19 21:32 ` [PATCH net-next 1/5] mpls: Use definition for reserved label checks Robert Shearman
2015-03-20  0:41   ` Eric W. Biederman
2015-03-20 14:12     ` Robert Shearman
2015-03-19 21:32 ` [PATCH net-next 2/5] mpls: Remove incorrect PHP comment Robert Shearman
2015-03-19 21:32 ` [PATCH net-next 3/5] mpls: Differentiate implicit-null and unlabeled neighbours Robert Shearman
2015-03-19 21:32 ` [PATCH net-next 4/5] mpls: Per-device enabling of packet forwarding Robert Shearman
2015-03-19 21:32 ` [PATCH net-next 5/5] mpls: Allow payload type to be associated with label routes Robert Shearman
2015-03-20 15:42 ` [PATCH net-next v2 0/5] mpls: Behaviour-changing improvements Robert Shearman
2015-03-20 15:42   ` [PATCH net-next v2 1/5] mpls: Use definition for reserved label checks Robert Shearman
2015-03-22 19:09     ` Eric W. Biederman
2015-03-20 15:42   ` [PATCH net-next v2 2/5] mpls: Remove incorrect PHP comment Robert Shearman
2015-03-22 19:12     ` Eric W. Biederman
2015-03-23 11:32       ` Robert Shearman
2015-03-23 18:16         ` Eric W. Biederman
2015-03-24 15:18           ` Robert Shearman
2015-03-24 18:43             ` Vivek Venkatraman
2015-03-20 15:42   ` [PATCH net-next v2 3/5] mpls: Differentiate implicit-null and unlabeled neighbours Robert Shearman
2015-03-22 19:49     ` Eric W. Biederman
2015-03-22 21:06       ` Eric W. Biederman
2015-03-23 11:47         ` Robert Shearman
2015-03-20 15:42   ` [PATCH net-next v2 4/5] mpls: Per-device enabling of packet forwarding Robert Shearman
2015-03-22 20:02     ` Eric W. Biederman
2015-03-22 20:34       ` Eric W. Biederman
2015-03-23 13:42         ` Robert Shearman
2015-03-23 13:10       ` Robert Shearman
2015-03-20 15:42   ` [PATCH net-next v2 5/5] mpls: Allow payload type to be associated with label routes Robert Shearman
2015-03-22 20:56     ` Eric W. Biederman
2015-03-23 14:02       ` Robert Shearman
2015-03-30 18:15   ` [PATCH net-next v3 0/4] mpls: Behaviour-changing improvements Robert Shearman
2015-03-30 18:15     ` [PATCH net-next v3 1/4] mpls: Use definition for reserved label checks Robert Shearman
2015-03-30 18:15     ` [PATCH net-next v3 2/4] mpls: Differentiate implicit-null and unlabeled neighbours Robert Shearman
2015-04-07 16:56       ` Eric W. Biederman
2015-04-08 17:08         ` Robert Shearman
2015-03-30 18:15     ` [PATCH net-next v3 3/4] mpls: Per-device enabling of packet input Robert Shearman
2015-04-07 17:02       ` Eric W. Biederman
2015-04-08 14:29         ` Robert Shearman
2015-04-08 14:44           ` Eric W. Biederman
2015-03-30 18:15     ` [PATCH net-next v3 4/4] mpls: Allow payload type to be associated with label routes Robert Shearman
2015-04-07 17:19       ` Eric W. Biederman
2015-04-08 14:03         ` Robert Shearman
2015-04-01 19:30     ` [PATCH net-next v3 0/4] mpls: Behaviour-changing improvements David Miller
2015-04-01 21:14       ` Eric W. Biederman
2015-04-01 23:49       ` Robert Shearman
2015-04-06 20:02     ` David Miller
2015-04-14 22:44     ` [PATCH net-next v4 0/6] " Robert Shearman
2015-04-14 22:44       ` [PATCH net-next v4 1/6] mpls: Use definition for reserved label checks Robert Shearman
2015-04-14 22:44       ` [PATCH net-next v4 2/6] mpls: Per-device MPLS state Robert Shearman
2015-04-14 22:45       ` [PATCH net-next v4 3/6] mpls: Per-device enabling of packet input Robert Shearman
2015-04-14 22:45       ` [PATCH net-next v4 4/6] mpls: Allow payload type to be associated with label routes Robert Shearman
2015-04-14 22:45       ` [PATCH net-next v4 5/6] mpls: Differentiate implicit-null and unlabeled neighbours Robert Shearman
2015-04-14 22:45       ` [PATCH net-next v4 6/6] mpls: Prevent use of implicit NULL label as outgoing label Robert Shearman
2015-04-21 20:34       ` [PATCH 0/3] mpls: ABI changes for security and correctness Robert Shearman
2015-04-21 20:34         ` [PATCH 1/3] mpls: Per-device MPLS state Robert Shearman
2015-04-21 20:34         ` [PATCH 2/3] mpls: Per-device enabling of packet input Robert Shearman
2015-04-21 20:34         ` [PATCH 3/3] mpls: Prevent use of implicit NULL label as outgoing label Robert Shearman
2015-04-22  0:29         ` [PATCH 0/3] mpls: ABI changes for security and correctness Eric W. Biederman
2015-04-22  2:12           ` David Miller
2015-04-22 10:10           ` Robert Shearman
2015-04-22 10:14         ` [PATCH v2 " Robert Shearman
2015-04-22 10:14           ` [PATCH v2 1/3] mpls: Per-device MPLS state Robert Shearman
2015-04-22 15:25             ` Eric W. Biederman
2015-04-22 10:14           ` [PATCH v2 2/3] mpls: Per-device enabling of packet input Robert Shearman
2015-04-22 16:27             ` Eric W. Biederman
2015-04-22 10:14           ` [PATCH v2 3/3] mpls: Prevent use of implicit NULL label as outgoing label Robert Shearman
2015-04-22 16:32             ` Eric W. Biederman
2015-04-22 16:47           ` [PATCH v2 0/3] mpls: ABI changes for security and correctness Eric W. Biederman
2015-04-22 18:25             ` David Miller

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.