netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] veth: Showing peer of veth type dev in ip link (kernel side)
@ 2013-10-04  2:34 Masatake YAMATO
  2013-10-08 19:23 ` David Miller
  0 siblings, 1 reply; 16+ messages in thread
From: Masatake YAMATO @ 2013-10-04  2:34 UTC (permalink / raw)
  To: netdev; +Cc: yamato

ip link has ability to show extra information of net work device if
kernel provides sunh information. With this patch veth driver can
provide its peer ifindex information to ip command via netlink
interface.

Signed-off-by: Masatake YAMATO <yamato@redhat.com>
---
 drivers/net/veth.c | 21 +++++++++++++++++++++
 1 file changed, 21 insertions(+)

diff --git a/drivers/net/veth.c b/drivers/net/veth.c
index eee1f19..54187b9 100644
--- a/drivers/net/veth.c
+++ b/drivers/net/veth.c
@@ -434,6 +434,25 @@ static const struct nla_policy veth_policy[VETH_INFO_MAX + 1] = {
 	[VETH_INFO_PEER]	= { .len = sizeof(struct ifinfomsg) },
 };
 
+static size_t veth_get_size(const struct net_device *dev)
+{
+	return nla_total_size(sizeof(u64)) + /* VETH_INFO_PEER */
+		0;
+}
+
+static int veth_fill_info(struct sk_buff *skb, const struct net_device *dev)
+{
+	struct veth_priv *priv = netdev_priv(dev);
+	struct net_device *peer = rtnl_dereference(priv->peer);
+	u64 peer_ifindex;
+
+	peer_ifindex = peer ? peer->ifindex : 0;
+	if (nla_put_u64(skb, VETH_INFO_PEER, peer_ifindex))
+		return -EMSGSIZE;
+
+	return 0;
+}
+
 static struct rtnl_link_ops veth_link_ops = {
 	.kind		= DRV_NAME,
 	.priv_size	= sizeof(struct veth_priv),
@@ -443,6 +462,8 @@ static struct rtnl_link_ops veth_link_ops = {
 	.dellink	= veth_dellink,
 	.policy		= veth_policy,
 	.maxtype	= VETH_INFO_MAX,
+	.get_size	= veth_get_size,
+	.fill_info	= veth_fill_info,
 };
 
 /*
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH] veth: Showing peer of veth type dev in ip link (kernel side)
  2013-10-04  2:34 [PATCH] veth: Showing peer of veth type dev in ip link (kernel side) Masatake YAMATO
@ 2013-10-08 19:23 ` David Miller
  2013-10-08 21:13   ` Stephen Hemminger
  0 siblings, 1 reply; 16+ messages in thread
From: David Miller @ 2013-10-08 19:23 UTC (permalink / raw)
  To: yamato; +Cc: netdev

From: Masatake YAMATO <yamato@redhat.com>
Date: Fri,  4 Oct 2013 11:34:21 +0900

> ip link has ability to show extra information of net work device if
> kernel provides sunh information. With this patch veth driver can
> provide its peer ifindex information to ip command via netlink
> interface.
> 
> Signed-off-by: Masatake YAMATO <yamato@redhat.com>

Applied to net-next, thank you.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] veth: Showing peer of veth type dev in ip link (kernel side)
  2013-10-08 19:23 ` David Miller
@ 2013-10-08 21:13   ` Stephen Hemminger
  2013-10-09  1:52     ` David Miller
       [not found]     ` <20131009165254.2e1c8332@nehalam.linuxnetplumber.net>
  0 siblings, 2 replies; 16+ messages in thread
From: Stephen Hemminger @ 2013-10-08 21:13 UTC (permalink / raw)
  To: David Miller; +Cc: yamato, netdev

On Tue, 08 Oct 2013 15:23:49 -0400 (EDT)
David Miller <davem@davemloft.net> wrote:

> From: Masatake YAMATO <yamato@redhat.com>
> Date: Fri,  4 Oct 2013 11:34:21 +0900
> 
> > ip link has ability to show extra information of net work device if
> > kernel provides sunh information. With this patch veth driver can
> > provide its peer ifindex information to ip command via netlink
> > interface.
> > 
> > Signed-off-by: Masatake YAMATO <yamato@redhat.com>
> 
> Applied to net-next, thank you.
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

Please revert this. It is incorrect.
The info returned by any netlink message should be equal to the message
for setting.

I think the correct patch would be something like this (compile tested only).

--- a/drivers/net/veth.c	2013-10-06 14:48:23.806461177 -0700
+++ b/drivers/net/veth.c	2013-10-08 14:11:42.434074690 -0700
@@ -434,6 +434,35 @@ static const struct nla_policy veth_poli
 	[VETH_INFO_PEER]	= { .len = sizeof(struct ifinfomsg) },
 };
 
+static size_t veth_get_size(const struct net_device *dev)
+{
+	return nla_total_size(sizeof(struct ifinfomsg)) + /* VETH_INFO_PEER */
+		0;
+}
+
+static int veth_fill_info(struct sk_buff *skb, const struct net_device *dev)
+{
+	struct veth_priv *priv = netdev_priv(dev);
+	struct net_device *peer = rtnl_dereference(priv->peer);
+
+	if (peer) {
+		struct ifinfomsg ifi = {
+			.ifi_family = AF_UNSPEC,
+			.ifi_type = peer->type,
+			.ifi_index = peer->ifindex,
+			.ifi_flags = dev_get_flags(peer),
+		};
+
+		if (nla_put(skb, VETH_INFO_PEER, sizeof(ifi), &ifi))
+			goto nla_put_failure;
+	}
+
+	return 0;
+
+nla_put_failure:
+	return -EMSGSIZE;
+}
+
 static struct rtnl_link_ops veth_link_ops = {
 	.kind		= DRV_NAME,
 	.priv_size	= sizeof(struct veth_priv),
@@ -443,6 +472,8 @@ static struct rtnl_link_ops veth_link_op
 	.dellink	= veth_dellink,
 	.policy		= veth_policy,
 	.maxtype	= VETH_INFO_MAX,
+	.get_size	= veth_get_size,
+	.fill_info	= veth_fill_info,
 };
 
 /*

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] veth: Showing peer of veth type dev in ip link (kernel side)
  2013-10-08 21:13   ` Stephen Hemminger
@ 2013-10-09  1:52     ` David Miller
       [not found]     ` <20131009165254.2e1c8332@nehalam.linuxnetplumber.net>
  1 sibling, 0 replies; 16+ messages in thread
From: David Miller @ 2013-10-09  1:52 UTC (permalink / raw)
  To: stephen; +Cc: yamato, netdev

From: Stephen Hemminger <stephen@networkplumber.org>
Date: Tue, 8 Oct 2013 14:13:37 -0700

> Please revert this. It is incorrect.

Ok, done.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] veth: Showing peer of veth type dev in ip link (kernel side)
       [not found]     ` <20131009165254.2e1c8332@nehalam.linuxnetplumber.net>
@ 2013-10-10  0:17       ` Eric W. Biederman
  2013-10-15 16:44         ` Nicolas Dichtel
  0 siblings, 1 reply; 16+ messages in thread
From: Eric W. Biederman @ 2013-10-10  0:17 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: David Miller, yamato, netdev

Stephen Hemminger <stephen@networkplumber.org> writes:

> On Tue, 8 Oct 2013 14:13:37 -0700
> Stephen Hemminger <stephen@networkplumber.org> wrote:
>
>> On Tue, 08 Oct 2013 15:23:49 -0400 (EDT)
>> David Miller <davem@davemloft.net> wrote:
>> 
>> > From: Masatake YAMATO <yamato@redhat.com>
>> > Date: Fri,  4 Oct 2013 11:34:21 +0900
>> > 
>> > > ip link has ability to show extra information of net work device if
>> > > kernel provides sunh information. With this patch veth driver can
>> > > provide its peer ifindex information to ip command via netlink
>> > > interface.
>> > > 
>> > > Signed-off-by: Masatake YAMATO <yamato@redhat.com>
>> > 
>> > Applied to net-next, thank you.
>> > --
>> > To unsubscribe from this list: send the line "unsubscribe netdev" in
>> > the body of a message to majordomo@vger.kernel.org
>> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
>> 
>> Please revert this. It is incorrect.
>> The info returned by any netlink message should be equal to the message
>> for setting.
>> 
>> I think the correct patch would be something like this (compile tested only).
>> 
>> --- a/drivers/net/veth.c	2013-10-06 14:48:23.806461177 -0700
>> +++ b/drivers/net/veth.c	2013-10-08 14:11:42.434074690 -0700
>> @@ -434,6 +434,35 @@ static const struct nla_policy veth_poli
>>  	[VETH_INFO_PEER]	= { .len = sizeof(struct ifinfomsg) },
>>  };
>>  
>> +static size_t veth_get_size(const struct net_device *dev)
>> +{
>> +	return nla_total_size(sizeof(struct ifinfomsg)) + /* VETH_INFO_PEER */
>> +		0;
>> +}
>> +
>> +static int veth_fill_info(struct sk_buff *skb, const struct net_device *dev)
>> +{
>> +	struct veth_priv *priv = netdev_priv(dev);
>> +	struct net_device *peer = rtnl_dereference(priv->peer);
>> +
>> +	if (peer) {
>> +		struct ifinfomsg ifi = {
>> +			.ifi_family = AF_UNSPEC,
>> +			.ifi_type = peer->type,
>> +			.ifi_index = peer->ifindex,
>> +			.ifi_flags = dev_get_flags(peer),
>> +		};
>> +
>> +		if (nla_put(skb, VETH_INFO_PEER, sizeof(ifi), &ifi))
>> +			goto nla_put_failure;
>> +	}
>> +
>> +	return 0;
>> +
>> +nla_put_failure:
>> +	return -EMSGSIZE;
>> +}
>> +
>>  static struct rtnl_link_ops veth_link_ops = {
>>  	.kind		= DRV_NAME,
>>  	.priv_size	= sizeof(struct veth_priv),
>> @@ -443,6 +472,8 @@ static struct rtnl_link_ops veth_link_op
>>  	.dellink	= veth_dellink,
>>  	.policy		= veth_policy,
>>  	.maxtype	= VETH_INFO_MAX,
>> +	.get_size	= veth_get_size,
>> +	.fill_info	= veth_fill_info,
>>  };
>>  
>>  /*
>> 
>> 
>
> This patch is ok as RFC starting point but the full implementation needs to
> add on IFLA_NAME and other attributes such that the full peer can be reconstructed.
>
> Ideally, the output of 'ip link' command can be in a format that can be used
> to recreate the same veth pair.
>
> One issue is that veth has the ability to make a peer in a different namespace
> and the network namespace code does not appear to have the ability to be invertable.
> I.e it is not possible to construct IFLA_NET_NS_PID or IFLA_NET_NS_FD attributes
> from an existing network device namespace.

Right.

IFLA_NET_NS_PID is not invertible as there may be no processes running
in a pid namespace.

IFLA_NET_NS_FD is in principle invertible.  We just need to add a file
descriptor to the callers fd table.  I don't see IFLA_NET_NS_FD being
invertible for broadcast messages, but for unicast it looks like a bit
of a pain but there are no fundamental problems.

I don't know if we care enough yet to write the code for the
IFLA_NET_NS_FD attribute but it is doable.

Eric

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] veth: Showing peer of veth type dev in ip link (kernel side)
  2013-10-10  0:17       ` Eric W. Biederman
@ 2013-10-15 16:44         ` Nicolas Dichtel
  2013-10-15 20:34           ` Eric W. Biederman
  0 siblings, 1 reply; 16+ messages in thread
From: Nicolas Dichtel @ 2013-10-15 16:44 UTC (permalink / raw)
  To: Eric W. Biederman, Stephen Hemminger; +Cc: David Miller, yamato, netdev

Le 10/10/2013 02:17, Eric W. Biederman a écrit :
> Stephen Hemminger <stephen@networkplumber.org> writes:
>
>> On Tue, 8 Oct 2013 14:13:37 -0700
>> Stephen Hemminger <stephen@networkplumber.org> wrote:
>>
>>> On Tue, 08 Oct 2013 15:23:49 -0400 (EDT)
>>> David Miller <davem@davemloft.net> wrote:
>>>
>>>> From: Masatake YAMATO <yamato@redhat.com>
>>>> Date: Fri,  4 Oct 2013 11:34:21 +0900
>>>>
>>>>> ip link has ability to show extra information of net work device if
>>>>> kernel provides sunh information. With this patch veth driver can
>>>>> provide its peer ifindex information to ip command via netlink
>>>>> interface.
>>>>>
>>>>> Signed-off-by: Masatake YAMATO <yamato@redhat.com>
>>>>
>>>> Applied to net-next, thank you.
>>>> --
>>>> To unsubscribe from this list: send the line "unsubscribe netdev" in
>>>> the body of a message to majordomo@vger.kernel.org
>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>
>>> Please revert this. It is incorrect.
>>> The info returned by any netlink message should be equal to the message
>>> for setting.
>>>
>>> I think the correct patch would be something like this (compile tested only).
>>>
>>> --- a/drivers/net/veth.c	2013-10-06 14:48:23.806461177 -0700
>>> +++ b/drivers/net/veth.c	2013-10-08 14:11:42.434074690 -0700
>>> @@ -434,6 +434,35 @@ static const struct nla_policy veth_poli
>>>   	[VETH_INFO_PEER]	= { .len = sizeof(struct ifinfomsg) },
>>>   };
>>>
>>> +static size_t veth_get_size(const struct net_device *dev)
>>> +{
>>> +	return nla_total_size(sizeof(struct ifinfomsg)) + /* VETH_INFO_PEER */
>>> +		0;
>>> +}
>>> +
>>> +static int veth_fill_info(struct sk_buff *skb, const struct net_device *dev)
>>> +{
>>> +	struct veth_priv *priv = netdev_priv(dev);
>>> +	struct net_device *peer = rtnl_dereference(priv->peer);
>>> +
>>> +	if (peer) {
>>> +		struct ifinfomsg ifi = {
>>> +			.ifi_family = AF_UNSPEC,
>>> +			.ifi_type = peer->type,
>>> +			.ifi_index = peer->ifindex,
>>> +			.ifi_flags = dev_get_flags(peer),
>>> +		};
>>> +
>>> +		if (nla_put(skb, VETH_INFO_PEER, sizeof(ifi), &ifi))
>>> +			goto nla_put_failure;
>>> +	}
>>> +
>>> +	return 0;
>>> +
>>> +nla_put_failure:
>>> +	return -EMSGSIZE;
>>> +}
>>> +
>>>   static struct rtnl_link_ops veth_link_ops = {
>>>   	.kind		= DRV_NAME,
>>>   	.priv_size	= sizeof(struct veth_priv),
>>> @@ -443,6 +472,8 @@ static struct rtnl_link_ops veth_link_op
>>>   	.dellink	= veth_dellink,
>>>   	.policy		= veth_policy,
>>>   	.maxtype	= VETH_INFO_MAX,
>>> +	.get_size	= veth_get_size,
>>> +	.fill_info	= veth_fill_info,
>>>   };
>>>
>>>   /*
>>>
>>>
>>
>> This patch is ok as RFC starting point but the full implementation needs to
>> add on IFLA_NAME and other attributes such that the full peer can be reconstructed.
>>
>> Ideally, the output of 'ip link' command can be in a format that can be used
>> to recreate the same veth pair.
>>
>> One issue is that veth has the ability to make a peer in a different namespace
>> and the network namespace code does not appear to have the ability to be invertable.
>> I.e it is not possible to construct IFLA_NET_NS_PID or IFLA_NET_NS_FD attributes
>> from an existing network device namespace.
>
> Right.
>
> IFLA_NET_NS_PID is not invertible as there may be no processes running
> in a pid namespace.
>
> IFLA_NET_NS_FD is in principle invertible.  We just need to add a file
> descriptor to the callers fd table.  I don't see IFLA_NET_NS_FD being
> invertible for broadcast messages, but for unicast it looks like a bit
> of a pain but there are no fundamental problems.
I'm not sure to understand why it is invertible only for unicast message.
Or are you saying that it is invertible only for the netns where the caller 
stands (and then not for the veth peer)?

>
> I don't know if we care enough yet to write the code for the
> IFLA_NET_NS_FD attribute but it is doable.
I care ;-)
Has somebody already started to write a patch?

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] veth: Showing peer of veth type dev in ip link (kernel side)
  2013-10-15 16:44         ` Nicolas Dichtel
@ 2013-10-15 20:34           ` Eric W. Biederman
  2013-10-16 10:08             ` Nicolas Dichtel
  0 siblings, 1 reply; 16+ messages in thread
From: Eric W. Biederman @ 2013-10-15 20:34 UTC (permalink / raw)
  To: nicolas.dichtel; +Cc: Stephen Hemminger, David Miller, yamato, netdev

Nicolas Dichtel <nicolas.dichtel@6wind.com> writes:

> Le 10/10/2013 02:17, Eric W. Biederman a écrit :
>>
>> Right.
>>
>> IFLA_NET_NS_PID is not invertible as there may be no processes running
>> in a pid namespace.
>>
>> IFLA_NET_NS_FD is in principle invertible.  We just need to add a file
>> descriptor to the callers fd table.  I don't see IFLA_NET_NS_FD being
>> invertible for broadcast messages, but for unicast it looks like a bit
>> of a pain but there are no fundamental problems.
> I'm not sure to understand why it is invertible only for unicast message.
> Or are you saying that it is invertible only for the netns where the
> caller stands (and then not for the veth peer)?

The pain is that it is a special case of SCM_RIGHTS aka passing file
descriptors.  Right now we don't support SCM_RIGHTS on netlink sockets
and so from that perspective IFLA_NET_NS_FD is a bit of a hack.

For unicast messages we can just stuff a file descriptor in the calling
process and be done with it.  For multicast messages we have to be much
more complete.

>> I don't know if we care enough yet to write the code for the
>> IFLA_NET_NS_FD attribute but it is doable.
> I care ;-)
> Has somebody already started to write a patch?

For IFLA_NET_NS_FD not that I know of.

Mostly it is doable but there are some silly cases.
- Do we need to actually implement SCM_RIGHTS to prevent people
  accepting file-descriptors unknowingly and hitting their file
  descriptor limits.

  In which case we need to call the attribute IFLA_NET_NS_SCM_FD
  so we knew it was just an index into the passed file descriptors.n

- Do we need an extra permission check to prevent keeping a network
  namespace alive longer than necessary?  Aka there are some permission
  checks opening and bind mounting /proc/<pid>/ns/net do we need
  a similar check.  Perhaps we would need to require CAP_NET_ADMIN over
  the target network namespace.

Beyond that it is just the logistics to open what is essentially
/proc/<pid>/ns/net and add it to the file descriptor table of the
requesting process.  Exactly which mount of proc we are going to
find the appropriate file to open I don't know.

It isn't likely to be lots of code but it is code that the necessary
infrastructure is not in place for, and a bunch of moderately hairy
corner cases to deal with.

Eric

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] veth: Showing peer of veth type dev in ip link (kernel side)
  2013-10-15 20:34           ` Eric W. Biederman
@ 2013-10-16 10:08             ` Nicolas Dichtel
  2013-10-16 19:53               ` Eric W. Biederman
  0 siblings, 1 reply; 16+ messages in thread
From: Nicolas Dichtel @ 2013-10-16 10:08 UTC (permalink / raw)
  To: Eric W. Biederman; +Cc: Stephen Hemminger, David Miller, yamato, netdev

Le 15/10/2013 22:34, Eric W. Biederman a écrit :
> Nicolas Dichtel <nicolas.dichtel@6wind.com> writes:
>
>> Le 10/10/2013 02:17, Eric W. Biederman a écrit :
>>>
>>> Right.
>>>
>>> IFLA_NET_NS_PID is not invertible as there may be no processes running
>>> in a pid namespace.
>>>
>>> IFLA_NET_NS_FD is in principle invertible.  We just need to add a file
>>> descriptor to the callers fd table.  I don't see IFLA_NET_NS_FD being
>>> invertible for broadcast messages, but for unicast it looks like a bit
>>> of a pain but there are no fundamental problems.
>> I'm not sure to understand why it is invertible only for unicast message.
>> Or are you saying that it is invertible only for the netns where the
>> caller stands (and then not for the veth peer)?
>
> The pain is that it is a special case of SCM_RIGHTS aka passing file
> descriptors.  Right now we don't support SCM_RIGHTS on netlink sockets
> and so from that perspective IFLA_NET_NS_FD is a bit of a hack.
>
> For unicast messages we can just stuff a file descriptor in the calling
> process and be done with it.  For multicast messages we have to be much
> more complete.
>
>>> I don't know if we care enough yet to write the code for the
>>> IFLA_NET_NS_FD attribute but it is doable.
>> I care ;-)
>> Has somebody already started to write a patch?
>
> For IFLA_NET_NS_FD not that I know of.
>
> Mostly it is doable but there are some silly cases.
> - Do we need to actually implement SCM_RIGHTS to prevent people
>    accepting file-descriptors unknowingly and hitting their file
>    descriptor limits.
>
>    In which case we need to call the attribute IFLA_NET_NS_SCM_FD
>    so we knew it was just an index into the passed file descriptors.n
>
> - Do we need an extra permission check to prevent keeping a network
>    namespace alive longer than necessary?  Aka there are some permission
>    checks opening and bind mounting /proc/<pid>/ns/net do we need
>    a similar check.  Perhaps we would need to require CAP_NET_ADMIN over
>    the target network namespace.
>
> Beyond that it is just the logistics to open what is essentially
> /proc/<pid>/ns/net and add it to the file descriptor table of the
> requesting process.  Exactly which mount of proc we are going to
> find the appropriate file to open I don't know.
>
> It isn't likely to be lots of code but it is code that the necessary
> infrastructure is not in place for, and a bunch of moderately hairy
> corner cases to deal with.
Got it. This doesn't seems the simpliest/best way to resolve this pb.
Can we not introduce another identifier (something like IFLA_NET_NS_ID),
which will not have such constraint?
inode is unique on the system, why not using it as an opaque value to
identitfy the netns (like 'ip netns identify' do)?

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] veth: Showing peer of veth type dev in ip link (kernel side)
  2013-10-16 10:08             ` Nicolas Dichtel
@ 2013-10-16 19:53               ` Eric W. Biederman
  2013-10-17 16:05                 ` Nicolas Dichtel
  0 siblings, 1 reply; 16+ messages in thread
From: Eric W. Biederman @ 2013-10-16 19:53 UTC (permalink / raw)
  To: nicolas.dichtel; +Cc: Stephen Hemminger, David Miller, yamato, netdev

Nicolas Dichtel <nicolas.dichtel@6wind.com> writes:

> Le 15/10/2013 22:34, Eric W. Biederman a écrit :

>> For IFLA_NET_NS_FD not that I know of.
>>
>> Mostly it is doable but there are some silly cases.
>> - Do we need to actually implement SCM_RIGHTS to prevent people
>>    accepting file-descriptors unknowingly and hitting their file
>>    descriptor limits.
>>
>>    In which case we need to call the attribute IFLA_NET_NS_SCM_FD
>>    so we knew it was just an index into the passed file descriptors.n
>>
>> - Do we need an extra permission check to prevent keeping a network
>>    namespace alive longer than necessary?  Aka there are some permission
>>    checks opening and bind mounting /proc/<pid>/ns/net do we need
>>    a similar check.  Perhaps we would need to require CAP_NET_ADMIN over
>>    the target network namespace.
>>
>> Beyond that it is just the logistics to open what is essentially
>> /proc/<pid>/ns/net and add it to the file descriptor table of the
>> requesting process.  Exactly which mount of proc we are going to
>> find the appropriate file to open I don't know.
>>
>> It isn't likely to be lots of code but it is code that the necessary
>> infrastructure is not in place for, and a bunch of moderately hairy
>> corner cases to deal with.
> Got it. This doesn't seems the simpliest/best way to resolve this pb.
> Can we not introduce another identifier (something like IFLA_NET_NS_ID),
> which will not have such constraint?
> inode is unique on the system, why not using it as an opaque value to
> identitfy the netns (like 'ip netns identify' do)?

The age old question why can't we have global identifiers for
namespaces?

The answer is that I don't want to implement a namespace for namespaces.

While the proc inode does work today across different mounts of proc, I
reserve the right at some future date (if it solves a technical problem)
to give each namespace a different inode number in each different mount
of proc.  So the inode number is not quite the unique identifier you
want.  The inode number is a close as I am willing to get to a namespace
of namespaces.

I think the simplest solution is to just not worry about which namespace
the other half of a veth pair is in.  But I have not encountered the
problem where I need to know exactly which namespace we are worrying
about.

Global identifiers are easy until you hit the cases where they make
things impossible.

Eric

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] veth: Showing peer of veth type dev in ip link (kernel side)
  2013-10-16 19:53               ` Eric W. Biederman
@ 2013-10-17 16:05                 ` Nicolas Dichtel
  2013-10-17 19:28                   ` Eric W. Biederman
  0 siblings, 1 reply; 16+ messages in thread
From: Nicolas Dichtel @ 2013-10-17 16:05 UTC (permalink / raw)
  To: Eric W. Biederman; +Cc: Stephen Hemminger, David Miller, yamato, netdev

Le 16/10/2013 21:53, Eric W. Biederman a écrit :
> Nicolas Dichtel <nicolas.dichtel@6wind.com> writes:
>
>> Le 15/10/2013 22:34, Eric W. Biederman a écrit :
>
>>> For IFLA_NET_NS_FD not that I know of.
>>>
>>> Mostly it is doable but there are some silly cases.
>>> - Do we need to actually implement SCM_RIGHTS to prevent people
>>>     accepting file-descriptors unknowingly and hitting their file
>>>     descriptor limits.
>>>
>>>     In which case we need to call the attribute IFLA_NET_NS_SCM_FD
>>>     so we knew it was just an index into the passed file descriptors.n
>>>
>>> - Do we need an extra permission check to prevent keeping a network
>>>     namespace alive longer than necessary?  Aka there are some permission
>>>     checks opening and bind mounting /proc/<pid>/ns/net do we need
>>>     a similar check.  Perhaps we would need to require CAP_NET_ADMIN over
>>>     the target network namespace.
>>>
>>> Beyond that it is just the logistics to open what is essentially
>>> /proc/<pid>/ns/net and add it to the file descriptor table of the
>>> requesting process.  Exactly which mount of proc we are going to
>>> find the appropriate file to open I don't know.
>>>
>>> It isn't likely to be lots of code but it is code that the necessary
>>> infrastructure is not in place for, and a bunch of moderately hairy
>>> corner cases to deal with.
>> Got it. This doesn't seems the simpliest/best way to resolve this pb.
>> Can we not introduce another identifier (something like IFLA_NET_NS_ID),
>> which will not have such constraint?
>> inode is unique on the system, why not using it as an opaque value to
>> identitfy the netns (like 'ip netns identify' do)?
>
> The age old question why can't we have global identifiers for
> namespaces?
>
> The answer is that I don't want to implement a namespace for namespaces.
Sorry, but I don't understand the problem. This ID is owned by the kernel, like
the netns list (for_each_net()) is owned by it.

>
> While the proc inode does work today across different mounts of proc, I
> reserve the right at some future date (if it solves a technical problem)
> to give each namespace a different inode number in each different mount
> of proc.  So the inode number is not quite the unique identifier you
> want.  The inode number is a close as I am willing to get to a namespace
> of namespaces.
>
> I think the simplest solution is to just not worry about which namespace
> the other half of a veth pair is in.  But I have not encountered the
> problem where I need to know exactly which namespace we are worrying
> about.
Ok, let's start by explaining our usecase.

We are using namespaces only to implement virtual routers (VR), ie only
the networking stack is virtualized. We don't care about other namespaces, we
just want to run several network stacks and beeing able to manage them.

For example, providers use this feature to isolate clients, one VR is opened
for each client. You can have a large number of clients (+10 000) and thus the
same number of netns.
Considering these numbers, we don't want to run one instance per VR for all of
our network daemons, but have only one instance that manage all VR.

You also have daemons that monitor the system and synchronize network objects
(interfaces, routes, etc.) on another linux. Goal is to implement an high
availablity system: it's possible to switch to the other linux to avoid service
interruption.
This kind of daemon wants to have the full information about interfaces to be
able to build/configure them on the other linux.

>
> Global identifiers are easy until you hit the cases where they make
> things impossible.
I don't want specially to use ID, but I fear that the solution with file
descriptors will be a nightmare.


Nicolas

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] veth: Showing peer of veth type dev in ip link (kernel side)
  2013-10-17 16:05                 ` Nicolas Dichtel
@ 2013-10-17 19:28                   ` Eric W. Biederman
  2013-10-18 15:34                     ` Nicolas Dichtel
  0 siblings, 1 reply; 16+ messages in thread
From: Eric W. Biederman @ 2013-10-17 19:28 UTC (permalink / raw)
  To: nicolas.dichtel; +Cc: Stephen Hemminger, David Miller, yamato, netdev

Nicolas Dichtel <nicolas.dichtel@6wind.com> writes:

> Le 16/10/2013 21:53, Eric W. Biederman a écrit :

>> The age old question why can't we have global identifiers for
>> namespaces?
>>
>> The answer is that I don't want to implement a namespace for namespaces.
> Sorry, but I don't understand the problem. This ID is owned by the kernel, like
> the netns list (for_each_net()) is owned by it.

The scenario where problem are likely to show up is something like this.

For testing it would be reasonable to setup two linux containers that
look like full linux systems.  In those containers you run one instance
of your virtual router managment daemons, and you arrange to synchronize
between the two linux containers for testing.

It becomes even more interesting when we want to migrate one of those
linux containers to another physical machine.

Global identifiers start breaking the first scenario, and really trash
the second scenario.

At the same time migration of configuration and replication of
configuration are essentially the same problem, so it would be very
silly to design such that will cause problems.

>> While the proc inode does work today across different mounts of proc, I
>> reserve the right at some future date (if it solves a technical problem)
>> to give each namespace a different inode number in each different mount
>> of proc.  So the inode number is not quite the unique identifier you
>> want.  The inode number is a close as I am willing to get to a namespace
>> of namespaces.
>>
>> I think the simplest solution is to just not worry about which namespace
>> the other half of a veth pair is in.  But I have not encountered the
>> problem where I need to know exactly which namespace we are worrying
>> about.
> Ok, let's start by explaining our usecase.
>
> We are using namespaces only to implement virtual routers (VR), ie only
> the networking stack is virtualized. We don't care about other namespaces, we
> just want to run several network stacks and beeing able to manage them.
>
> For example, providers use this feature to isolate clients, one VR is opened
> for each client. You can have a large number of clients (+10 000) and thus the
> same number of netns.
> Considering these numbers, we don't want to run one instance per VR for all of
> our network daemons, but have only one instance that manage all VR.
>
> You also have daemons that monitor the system and synchronize network objects
> (interfaces, routes, etc.) on another linux. Goal is to implement an high
> availablity system: it's possible to switch to the other linux to avoid service
> interruption.
> This kind of daemon wants to have the full information about interfaces to be
> able to build/configure them on the other linux.
>
>>
>> Global identifiers are easy until you hit the cases where they make
>> things impossible.
> I don't want specially to use ID, but I fear that the solution with file
> descriptors will be a nightmare.

I can certainly see challenges.  In asking for symmetry between set and
get the solution with file descriptors is the obvious answer and the
first answer I have been able to come up with so far.

My original answer was that the ifindex happened to be unique across
namespaces but that actually turned out to be a problem for migration
so that abandoned.

Namespace file descriptors are the solution that I know semantically
will work.  Beyond that I don't have any good ideas right now.

I just know that local names (aka file descriptors) are much easier to
work with semantically than global names.

Eric

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] veth: Showing peer of veth type dev in ip link (kernel side)
  2013-10-17 19:28                   ` Eric W. Biederman
@ 2013-10-18 15:34                     ` Nicolas Dichtel
  0 siblings, 0 replies; 16+ messages in thread
From: Nicolas Dichtel @ 2013-10-18 15:34 UTC (permalink / raw)
  To: Eric W. Biederman; +Cc: Stephen Hemminger, David Miller, yamato, netdev

Le 17/10/2013 21:28, Eric W. Biederman a écrit :
> Nicolas Dichtel <nicolas.dichtel@6wind.com> writes:
>
>> Le 16/10/2013 21:53, Eric W. Biederman a écrit :
>
>>> The age old question why can't we have global identifiers for
>>> namespaces?
>>>
>>> The answer is that I don't want to implement a namespace for namespaces.
>> Sorry, but I don't understand the problem. This ID is owned by the kernel, like
>> the netns list (for_each_net()) is owned by it.
>
> The scenario where problem are likely to show up is something like this.
>
> For testing it would be reasonable to setup two linux containers that
> look like full linux systems.  In those containers you run one instance
> of your virtual router managment daemons, and you arrange to synchronize
> between the two linux containers for testing.
>
> It becomes even more interesting when we want to migrate one of those
> linux containers to another physical machine.
>
> Global identifiers start breaking the first scenario, and really trash
> the second scenario.
>
> At the same time migration of configuration and replication of
> configuration are essentially the same problem, so it would be very
> silly to design such that will cause problems.
Ok, I'm now convinced ;-)

>
>>> While the proc inode does work today across different mounts of proc, I
>>> reserve the right at some future date (if it solves a technical problem)
>>> to give each namespace a different inode number in each different mount
>>> of proc.  So the inode number is not quite the unique identifier you
>>> want.  The inode number is a close as I am willing to get to a namespace
>>> of namespaces.
>>>
>>> I think the simplest solution is to just not worry about which namespace
>>> the other half of a veth pair is in.  But I have not encountered the
>>> problem where I need to know exactly which namespace we are worrying
>>> about.
>> Ok, let's start by explaining our usecase.
>>
>> We are using namespaces only to implement virtual routers (VR), ie only
>> the networking stack is virtualized. We don't care about other namespaces, we
>> just want to run several network stacks and beeing able to manage them.
>>
>> For example, providers use this feature to isolate clients, one VR is opened
>> for each client. You can have a large number of clients (+10 000) and thus the
>> same number of netns.
>> Considering these numbers, we don't want to run one instance per VR for all of
>> our network daemons, but have only one instance that manage all VR.
>>
>> You also have daemons that monitor the system and synchronize network objects
>> (interfaces, routes, etc.) on another linux. Goal is to implement an high
>> availablity system: it's possible to switch to the other linux to avoid service
>> interruption.
>> This kind of daemon wants to have the full information about interfaces to be
>> able to build/configure them on the other linux.
>>
>>>
>>> Global identifiers are easy until you hit the cases where they make
>>> things impossible.
>> I don't want specially to use ID, but I fear that the solution with file
>> descriptors will be a nightmare.
>
> I can certainly see challenges.  In asking for symmetry between set and
> get the solution with file descriptors is the obvious answer and the
> first answer I have been able to come up with so far.
>
> My original answer was that the ifindex happened to be unique across
> namespaces but that actually turned out to be a problem for migration
> so that abandoned.
>
> Namespace file descriptors are the solution that I know semantically
> will work.  Beyond that I don't have any good ideas right now.
>
> I just know that local names (aka file descriptors) are much easier to
> work with semantically than global names.
Yes sure. I will continue to think about this.


Thank you,
Nicolas

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] veth: Showing peer of veth type dev in ip link (kernel side)
  2013-10-04  4:05 Masatake YAMATO
  2013-10-04  4:49 ` Eric Dumazet
  2013-10-04 15:21 ` Nicolas Dichtel
@ 2013-10-04 17:55 ` Stephen Hemminger
  2 siblings, 0 replies; 16+ messages in thread
From: Stephen Hemminger @ 2013-10-04 17:55 UTC (permalink / raw)
  To: Masatake YAMATO; +Cc: netdev

On Fri,  4 Oct 2013 13:05:29 +0900
Masatake YAMATO <yamato@redhat.com> wrote:

> ip link has ability to show extra information of net work device if
> kernel provides sunh information. With this patch veth driver can
> provide its peer ifindex information to ip command via netlink
> interface.
> 
> Signed-off-by: Masatake YAMATO <yamato@redhat.com>
> ---

netlink API's are supposed to be symmetrical.

When creating veth, the VETH_INFO_PEER attribute is struct(ifinfomsg).
The fill_info should tack on the same data.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] veth: Showing peer of veth type dev in ip link (kernel side)
  2013-10-04  4:05 Masatake YAMATO
  2013-10-04  4:49 ` Eric Dumazet
@ 2013-10-04 15:21 ` Nicolas Dichtel
  2013-10-04 17:55 ` Stephen Hemminger
  2 siblings, 0 replies; 16+ messages in thread
From: Nicolas Dichtel @ 2013-10-04 15:21 UTC (permalink / raw)
  To: Masatake YAMATO, netdev

Le 04/10/2013 06:05, Masatake YAMATO a écrit :
> ip link has ability to show extra information of net work device if
> kernel provides sunh information. With this patch veth driver can
> provide its peer ifindex information to ip command via netlink
> interface.
But the ifindex can be interpreted only when you know the related netns and
veth peer is probably in another netns.
How to found the netdevice in userland in this case?

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] veth: Showing peer of veth type dev in ip link (kernel side)
  2013-10-04  4:05 Masatake YAMATO
@ 2013-10-04  4:49 ` Eric Dumazet
  2013-10-04 15:21 ` Nicolas Dichtel
  2013-10-04 17:55 ` Stephen Hemminger
  2 siblings, 0 replies; 16+ messages in thread
From: Eric Dumazet @ 2013-10-04  4:49 UTC (permalink / raw)
  To: Masatake YAMATO; +Cc: netdev

On Fri, 2013-10-04 at 13:05 +0900, Masatake YAMATO wrote:
> ip link has ability to show extra information of net work device if
> kernel provides sunh information. With this patch veth driver can
> provide its peer ifindex information to ip command via netlink
> interface.

> +
> +	peer_ifindex = peer ? peer->ifindex : 0;
> +	if (nla_put_u64(skb, VETH_INFO_PEER, peer_ifindex))
> +		return -EMSGSIZE;
> +

ifindex are int, so you should use nla_put_u32()

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH] veth: Showing peer of veth type dev in ip link (kernel side)
@ 2013-10-04  4:05 Masatake YAMATO
  2013-10-04  4:49 ` Eric Dumazet
                   ` (2 more replies)
  0 siblings, 3 replies; 16+ messages in thread
From: Masatake YAMATO @ 2013-10-04  4:05 UTC (permalink / raw)
  To: netdev; +Cc: yamato

ip link has ability to show extra information of net work device if
kernel provides sunh information. With this patch veth driver can
provide its peer ifindex information to ip command via netlink
interface.

Signed-off-by: Masatake YAMATO <yamato@redhat.com>
---
 drivers/net/veth.c | 21 +++++++++++++++++++++
 1 file changed, 21 insertions(+)

diff --git a/drivers/net/veth.c b/drivers/net/veth.c
index eee1f19..54187b9 100644
--- a/drivers/net/veth.c
+++ b/drivers/net/veth.c
@@ -434,6 +434,25 @@ static const struct nla_policy veth_policy[VETH_INFO_MAX + 1] = {
 	[VETH_INFO_PEER]	= { .len = sizeof(struct ifinfomsg) },
 };
 
+static size_t veth_get_size(const struct net_device *dev)
+{
+	return nla_total_size(sizeof(u64)) + /* VETH_INFO_PEER */
+		0;
+}
+
+static int veth_fill_info(struct sk_buff *skb, const struct net_device *dev)
+{
+	struct veth_priv *priv = netdev_priv(dev);
+	struct net_device *peer = rtnl_dereference(priv->peer);
+	u64 peer_ifindex;
+
+	peer_ifindex = peer ? peer->ifindex : 0;
+	if (nla_put_u64(skb, VETH_INFO_PEER, peer_ifindex))
+		return -EMSGSIZE;
+
+	return 0;
+}
+
 static struct rtnl_link_ops veth_link_ops = {
 	.kind		= DRV_NAME,
 	.priv_size	= sizeof(struct veth_priv),
@@ -443,6 +462,8 @@ static struct rtnl_link_ops veth_link_ops = {
 	.dellink	= veth_dellink,
 	.policy		= veth_policy,
 	.maxtype	= VETH_INFO_MAX,
+	.get_size	= veth_get_size,
+	.fill_info	= veth_fill_info,
 };
 
 /*
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2013-10-18 15:34 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-10-04  2:34 [PATCH] veth: Showing peer of veth type dev in ip link (kernel side) Masatake YAMATO
2013-10-08 19:23 ` David Miller
2013-10-08 21:13   ` Stephen Hemminger
2013-10-09  1:52     ` David Miller
     [not found]     ` <20131009165254.2e1c8332@nehalam.linuxnetplumber.net>
2013-10-10  0:17       ` Eric W. Biederman
2013-10-15 16:44         ` Nicolas Dichtel
2013-10-15 20:34           ` Eric W. Biederman
2013-10-16 10:08             ` Nicolas Dichtel
2013-10-16 19:53               ` Eric W. Biederman
2013-10-17 16:05                 ` Nicolas Dichtel
2013-10-17 19:28                   ` Eric W. Biederman
2013-10-18 15:34                     ` Nicolas Dichtel
2013-10-04  4:05 Masatake YAMATO
2013-10-04  4:49 ` Eric Dumazet
2013-10-04 15:21 ` Nicolas Dichtel
2013-10-04 17:55 ` Stephen Hemminger

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).