[PATCH next 3/3] net: Use l3_dev instead of skb->dev for L3 processing

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH next 3/3] net: Use l3_dev instead of skb->dev for L3 processing
@ 2016-02-29 22:08 Mahesh Bandewar
  2016-03-03  4:45 ` Cong Wang
  0 siblings, 1 reply; 11+ messages in thread
From: Mahesh Bandewar @ 2016-02-29 22:08 UTC (permalink / raw)
  To: David Miller
  Cc: Mahesh Bandewar, Eric Dumazet, netdev, Tim Hockin, Alex Pollitt,
	Matthew Dupre

From: Mahesh Bandewar <maheshb@google.com>

netif_receive_skb_core() dispatcher uses skb->dev device to send it
to the packet-handlers (e.g. ip_rcv, ipv6_rcv etc). These packet
handlers intern use the device passed to determine the net-ns to
further process these packets.  Now with the nomination logic, the
dispatcher will call netif_get_l3_dev() helper to select the device
to be used for this processing. Since l3_dev is initialized to self,
normal packet processing should not change.

Signed-off-by: Mahesh Bandewar <maheshb@google.com>
CC: Eric Dumazet <edumazet@google.com>
CC: Tim Hockin <thockin@google.com>
CC: Alex Pollitt <alex.pollitt@metaswitch.com>
CC: Matthew Dupre <matthew.dupre@metaswitch.com>
---
 net/core/dev.c       | 9 ++++++---
 net/ipv4/ip_input.c  | 5 +++--
 net/ipv6/ip6_input.c | 5 +++--
 3 files changed, 12 insertions(+), 7 deletions(-)

diff --git a/net/core/dev.c b/net/core/dev.c
index c4023a68cdc1..9252436ef11a 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -1811,7 +1811,8 @@ static inline int deliver_skb(struct sk_buff *skb,
 	if (unlikely(skb_orphan_frags(skb, GFP_ATOMIC)))
 		return -ENOMEM;
 	atomic_inc(&skb->users);
-	return pt_prev->func(skb, skb->dev, pt_prev, orig_dev);
+	return pt_prev->func(skb, netif_get_l3_dev(skb->dev), pt_prev,
+			     orig_dev);
 }
 
 static inline void deliver_ptype_list_skb(struct sk_buff *skb,
@@ -1904,7 +1905,8 @@ again:
 	}
 out_unlock:
 	if (pt_prev)
-		pt_prev->func(skb2, skb->dev, pt_prev, skb->dev);
+		pt_prev->func(skb2, netif_get_l3_dev(skb->dev), pt_prev,
+			      skb->dev);
 	rcu_read_unlock();
 }
 
@@ -4157,7 +4159,8 @@ ncls:
 		if (unlikely(skb_orphan_frags(skb, GFP_ATOMIC)))
 			goto drop;
 		else
-			ret = pt_prev->func(skb, skb->dev, pt_prev, orig_dev);
+			ret = pt_prev->func(skb, netif_get_l3_dev(skb->dev),
+					    pt_prev, orig_dev);
 	} else {
 drop:
 		if (!deliver_exact)
diff --git a/net/ipv4/ip_input.c b/net/ipv4/ip_input.c
index e3d782746d9d..b47164e3e1c6 100644
--- a/net/ipv4/ip_input.c
+++ b/net/ipv4/ip_input.c
@@ -247,7 +247,8 @@ int ip_local_deliver(struct sk_buff *skb)
 	/*
 	 *	Reassemble IP fragments.
 	 */
-	struct net *net = dev_net(skb->dev);
+	struct net_device *dev = netif_get_l3_dev(skb->dev);
+	struct net *net = dev_net(dev);
 
 	if (ip_is_fragment(ip_hdr(skb))) {
 		if (ip_defrag(net, skb, IP_DEFRAG_LOCAL_DELIVER))
@@ -255,7 +256,7 @@ int ip_local_deliver(struct sk_buff *skb)
 	}
 
 	return NF_HOOK(NFPROTO_IPV4, NF_INET_LOCAL_IN,
-		       net, NULL, skb, skb->dev, NULL,
+		       net, NULL, skb, dev, NULL,
 		       ip_local_deliver_finish);
 }
 
diff --git a/net/ipv6/ip6_input.c b/net/ipv6/ip6_input.c
index c05c425c2389..88443ac06402 100644
--- a/net/ipv6/ip6_input.c
+++ b/net/ipv6/ip6_input.c
@@ -287,9 +287,10 @@ discard:
 
 int ip6_input(struct sk_buff *skb)
 {
+	struct net_device *dev = netif_get_l3_dev(skb->dev);
+
 	return NF_HOOK(NFPROTO_IPV6, NF_INET_LOCAL_IN,
-		       dev_net(skb->dev), NULL, skb, skb->dev, NULL,
-		       ip6_input_finish);
+		       dev_net(dev), NULL, skb, dev, NULL, ip6_input_finish);
 }
 
 int ip6_mc_input(struct sk_buff *skb)
-- 
2.7.0.rc3.207.g0ac5344

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH next 3/3] net: Use l3_dev instead of skb->dev for L3 processing
  2016-02-29 22:08 [PATCH next 3/3] net: Use l3_dev instead of skb->dev for L3 processing Mahesh Bandewar
@ 2016-03-03  4:45 ` Cong Wang
  2016-03-03 21:43   ` Mahesh Bandewar
  0 siblings, 1 reply; 11+ messages in thread
From: Cong Wang @ 2016-03-03  4:45 UTC (permalink / raw)
  To: Mahesh Bandewar
  Cc: David Miller, Mahesh Bandewar, Eric Dumazet, netdev, Tim Hockin,
	Alex Pollitt, Matthew Dupre

On Mon, Feb 29, 2016 at 2:08 PM, Mahesh Bandewar <mahesh@bandewar.net> wrote:
> From: Mahesh Bandewar <maheshb@google.com>
>
> netif_receive_skb_core() dispatcher uses skb->dev device to send it
> to the packet-handlers (e.g. ip_rcv, ipv6_rcv etc). These packet
> handlers intern use the device passed to determine the net-ns to
> further process these packets.  Now with the nomination logic, the
> dispatcher will call netif_get_l3_dev() helper to select the device
> to be used for this processing. Since l3_dev is initialized to self,
> normal packet processing should not change.
>

So, if I understand your patches correctly, _logically_ the skb is still
passed into the slave's netns via dev_forward_skb() but now goes over
the iptable rules from the default netns by only changing the netns
parameter to these hooks?

That is ugly... Logically, you should still need to continue to pass
the skb upper to the stack in default netns until ip_local_deliver_finish().

So, how about adding an iptable hook in ipvlan so that skb will
continue traverse in the original stack and then moved into slave's
netns? This might be harder since logically we need an L3 entrance
to the stack.

Thoughts?

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH next 3/3] net: Use l3_dev instead of skb->dev for L3 processing
  2016-03-03  4:45 ` Cong Wang
@ 2016-03-03 21:43   ` Mahesh Bandewar
       [not found]     ` <CAF2d9jiyjThgeQKw==tKRk3Sh0dNfzUdwJciwsVGPFLdb9uGTA@mail.gmail.com>
  0 siblings, 1 reply; 11+ messages in thread
From: Mahesh Bandewar @ 2016-03-03 21:43 UTC (permalink / raw)
  To: Cong Wang
  Cc: Mahesh Bandewar, David Miller, Eric Dumazet, netdev, Tim Hockin,
	Alex Pollitt, Matthew Dupre

On Wed, Mar 2, 2016 at 8:45 PM, Cong Wang <xiyou.wangcong@gmail.com> wrote:
>
> On Mon, Feb 29, 2016 at 2:08 PM, Mahesh Bandewar <mahesh@bandewar.net> wrote:
> > From: Mahesh Bandewar <maheshb@google.com>
> >
> > netif_receive_skb_core() dispatcher uses skb->dev device to send it
> > to the packet-handlers (e.g. ip_rcv, ipv6_rcv etc). These packet
> > handlers intern use the device passed to determine the net-ns to
> > further process these packets.  Now with the nomination logic, the
> > dispatcher will call netif_get_l3_dev() helper to select the device
> > to be used for this processing. Since l3_dev is initialized to self,
> > normal packet processing should not change.
> >
>
> So, if I understand your patches correctly, _logically_ the skb is still
> passed into the slave's netns via dev_forward_skb() but now goes over
> the iptable rules from the default netns by only changing the netns
> parameter to these hooks?
>
We are using different dev pointer for L3 processing than skb->dev. All
netns, routing etc, associated with this dev (l3_dev) should be used for L3.

> That is ugly... Logically, you should still need to continue to pass
> the skb upper to the stack in default netns until ip_local_deliver_finish().
>
>
> So, how about adding an iptable hook in ipvlan so that skb will
> continue traverse in the original stack and then moved into slave's
> netns? This might be harder since logically we need an L3 entrance
> to the stack.
>
> Thoughts?

As you mentioned logically we should be able to pass the skb in master's ns
until L3 processing is completed. This patch series attempts to do that by
disassociating this logic from skb->dev and adding it to l3_dev. This should
include not just IPT but all that is done in L3 phase (IPT, routing etc.)
Also since dev->l3_dev is same as dev, this should not break any existing logic.

That's the generic implementation as far as the stack is concerned and IPvlan
uses it to make the IPT hooks symmetric.

Another IPT hook may be good enough (however I haven't
given much thought to it) for IPvlan, but this generic approach will be for
whole of L3. Also currently this I have implemented for the ingress path
but that does not mean the same cannot be extended for the egress path
(in fact I'm thinking about that)

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH next 3/3] net: Use l3_dev instead of skb->dev for L3 processing
       [not found]     ` <CAF2d9jiyjThgeQKw==tKRk3Sh0dNfzUdwJciwsVGPFLdb9uGTA@mail.gmail.com>
@ 2016-03-04  0:44       ` Cong Wang
  2016-03-04  1:42         ` Mahesh Bandewar
  0 siblings, 1 reply; 11+ messages in thread
From: Cong Wang @ 2016-03-04  0:44 UTC (permalink / raw)
  To: Mahesh Bandewar
  Cc: Mahesh Bandewar, David Miller, Eric Dumazet, netdev, Tim Hockin,
	Alex Pollitt, Matthew Dupre

On Thu, Mar 3, 2016 at 3:21 PM, Mahesh Bandewar <maheshb@google.com> wrote:
>
>
> On Thu, Mar 3, 2016 at 1:43 PM, Mahesh Bandewar <maheshb@google.com> wrote:
>>
>> On Wed, Mar 2, 2016 at 8:45 PM, Cong Wang <xiyou.wangcong@gmail.com>
>> wrote:
>> >
>> > On Mon, Feb 29, 2016 at 2:08 PM, Mahesh Bandewar <mahesh@bandewar.net>
>> > wrote:
>> > > From: Mahesh Bandewar <maheshb@google.com>
>> > >
>> > > netif_receive_skb_core() dispatcher uses skb->dev device to send it
>> > > to the packet-handlers (e.g. ip_rcv, ipv6_rcv etc). These packet
>> > > handlers intern use the device passed to determine the net-ns to
>> > > further process these packets.  Now with the nomination logic, the
>> > > dispatcher will call netif_get_l3_dev() helper to select the device
>> > > to be used for this processing. Since l3_dev is initialized to self,
>> > > normal packet processing should not change.
>> > >
>> >
>> > So, if I understand your patches correctly, _logically_ the skb is still
>> > passed into the slave's netns via dev_forward_skb() but now goes over
>> > the iptable rules from the default netns by only changing the netns
>> > parameter to these hooks?
>> >
>> We are using different dev pointer for L3 processing than skb->dev. All
>> netns, routing etc, associated with this dev (l3_dev) should be used for
>> L3.
>>
>> > That is ugly... Logically, you should still need to continue to pass
>> > the skb upper to the stack in default netns until
>> > ip_local_deliver_finish().
>> >
>> >
>> > So, how about adding an iptable hook in ipvlan so that skb will
>> > continue traverse in the original stack and then moved into slave's
>> > netns? This might be harder since logically we need an L3 entrance
>> > to the stack.
>> >
>> > Thoughts?
>>
>> As you mentioned logically we should be able to pass the skb in master's
>> ns
>> until L3 processing is completed. This patch series attempts to do that by
>> disassociating this logic from skb->dev and adding it to l3_dev. This
>> should
>> include not just IPT but all that is done in L3 phase (IPT, routing etc.)
>> Also since dev->l3_dev is same as dev, this should not break any existing
>> logic.
>>
> Well, looking at the code I realized that I missed few places (especially
> routing
> logic) which continues using skb->dev in ingress path and should be
> corrected to
> use l3_dev. I'll correct those places and send the next version.


Look, even you yourself are missing something here. ;) This is exactly why
I suggest to consider another approach. Please don't introduce any code
that is hard to debug even for yourself. The struct net pointer is passed
around in kernel network subsystem almost everywhere, it is not easy to make
it bug-free by just switching skb->dev.


>> That's the generic implementation as far as the stack is concerned and
>> IPvlan
>> uses it to make the IPT hooks symmetric.
>>
>> Another IPT hook may be good enough (however I haven't
>> given much thought to it) for IPvlan, but this generic approach will be
>> for
>> whole of L3. Also currently this I have implemented for the ingress path
>> but that does not mean the same cannot be extended for the egress path
>> (in fact I'm thinking about that)
>

This is logically correct and easier to understand or debug, since IPT hook
is very common in network subsystem even ingress qdisc uses it.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH next 3/3] net: Use l3_dev instead of skb->dev for L3 processing
  2016-03-04  0:44       ` Cong Wang
@ 2016-03-04  1:42         ` Mahesh Bandewar
  2016-03-04 17:30           ` Cong Wang
  0 siblings, 1 reply; 11+ messages in thread
From: Mahesh Bandewar @ 2016-03-04  1:42 UTC (permalink / raw)
  To: Cong Wang
  Cc: Mahesh Bandewar, David Miller, Eric Dumazet, netdev, Tim Hockin,
	Alex Pollitt, Matthew Dupre

>>> As you mentioned logically we should be able to pass the skb in master's
>>> ns
>>> until L3 processing is completed. This patch series attempts to do that by
>>> disassociating this logic from skb->dev and adding it to l3_dev. This
>>> should
>>> include not just IPT but all that is done in L3 phase (IPT, routing etc.)
>>> Also since dev->l3_dev is same as dev, this should not break any existing
>>> logic.
>>>
>> Well, looking at the code I realized that I missed few places (especially
>> routing
>> logic) which continues using skb->dev in ingress path and should be
>> corrected to
>> use l3_dev. I'll correct those places and send the next version.
>
>
> Look, even you yourself are missing something here. ;) This is exactly why
> I suggest to consider another approach. Please don't introduce any code
> that is hard to debug even for yourself. The struct net pointer is passed
> around in kernel network subsystem almost everywhere, it is not easy to make
> it bug-free by just switching skb->dev.
>
I disagree. Conceptually this is very easy to understand as we are taking L3
processing off of skb->dev and loading it onto dev->l3_dev. So
everything that is
associated with l3_dev is for L3. This should neither make debugging harder
nor add complicated code.

>
>>> That's the generic implementation as far as the stack is concerned and
>>> IPvlan
>>> uses it to make the IPT hooks symmetric.
>>>
>>> Another IPT hook may be good enough (however I haven't
>>> given much thought to it) for IPvlan, but this generic approach will be
>>> for
>>> whole of L3. Also currently this I have implemented for the ingress path
>>> but that does not mean the same cannot be extended for the egress path
>>> (in fact I'm thinking about that)
>>
>
> This is logically correct and easier to understand or debug, since IPT hook
> is very common in network subsystem even ingress qdisc uses it.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH next 3/3] net: Use l3_dev instead of skb->dev for L3 processing
  2016-03-04  1:42         ` Mahesh Bandewar
@ 2016-03-04 17:30           ` Cong Wang
  2016-03-04 22:12             ` Mahesh Bandewar
  0 siblings, 1 reply; 11+ messages in thread
From: Cong Wang @ 2016-03-04 17:30 UTC (permalink / raw)
  To: Mahesh Bandewar
  Cc: Mahesh Bandewar, David Miller, Eric Dumazet, netdev, Tim Hockin,
	Alex Pollitt, Matthew Dupre

On Thu, Mar 3, 2016 at 5:42 PM, Mahesh Bandewar <maheshb@google.com> wrote:
>>>> As you mentioned logically we should be able to pass the skb in master's
>>>> ns
>>>> until L3 processing is completed. This patch series attempts to do that by
>>>> disassociating this logic from skb->dev and adding it to l3_dev. This
>>>> should
>>>> include not just IPT but all that is done in L3 phase (IPT, routing etc.)
>>>> Also since dev->l3_dev is same as dev, this should not break any existing
>>>> logic.
>>>>
>>> Well, looking at the code I realized that I missed few places (especially
>>> routing
>>> logic) which continues using skb->dev in ingress path and should be
>>> corrected to
>>> use l3_dev. I'll correct those places and send the next version.
>>
>>
>> Look, even you yourself are missing something here. ;) This is exactly why
>> I suggest to consider another approach. Please don't introduce any code
>> that is hard to debug even for yourself. The struct net pointer is passed
>> around in kernel network subsystem almost everywhere, it is not easy to make
>> it bug-free by just switching skb->dev.
>>
> I disagree. Conceptually this is very easy to understand as we are taking L3
> processing off of skb->dev and loading it onto dev->l3_dev. So
> everything that is
> associated with l3_dev is for L3. This should neither make debugging harder
> nor add complicated code.

Nope, conceptually it is not right, that breaks _isolation_ in
concept, we need to
make each skb really traverse in the original stack, not just
switching skb->dev,
that is cheating. It is just too easy to hide bugs in your way, we
never use network
namespace in this way before.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH next 3/3] net: Use l3_dev instead of skb->dev for L3 processing
  2016-03-04 17:30           ` Cong Wang
@ 2016-03-04 22:12             ` Mahesh Bandewar
  2016-03-08  5:37               ` Cong Wang
  0 siblings, 1 reply; 11+ messages in thread
From: Mahesh Bandewar @ 2016-03-04 22:12 UTC (permalink / raw)
  To: Cong Wang
  Cc: Mahesh Bandewar, David Miller, Eric Dumazet, netdev, Tim Hockin,
	Alex Pollitt, Matthew Dupre

On Fri, Mar 4, 2016 at 9:30 AM, Cong Wang <xiyou.wangcong@gmail.com> wrote:
> On Thu, Mar 3, 2016 at 5:42 PM, Mahesh Bandewar <maheshb@google.com> wrote:
>>>>> As you mentioned logically we should be able to pass the skb in master's
>>>>> ns
>>>>> until L3 processing is completed. This patch series attempts to do that by
>>>>> disassociating this logic from skb->dev and adding it to l3_dev. This
>>>>> should
>>>>> include not just IPT but all that is done in L3 phase (IPT, routing etc.)
>>>>> Also since dev->l3_dev is same as dev, this should not break any existing
>>>>> logic.
>>>>>
>>>> Well, looking at the code I realized that I missed few places (especially
>>>> routing
>>>> logic) which continues using skb->dev in ingress path and should be
>>>> corrected to
>>>> use l3_dev. I'll correct those places and send the next version.
>>>
>>>
>>> Look, even you yourself are missing something here. ;) This is exactly why
>>> I suggest to consider another approach. Please don't introduce any code
>>> that is hard to debug even for yourself. The struct net pointer is passed
>>> around in kernel network subsystem almost everywhere, it is not easy to make
>>> it bug-free by just switching skb->dev.
>>>
>> I disagree. Conceptually this is very easy to understand as we are taking L3
>> processing off of skb->dev and loading it onto dev->l3_dev. So
>> everything that is
>> associated with l3_dev is for L3. This should neither make debugging harder
>> nor add complicated code.
>
> Nope, conceptually it is not right, that breaks _isolation_ in
> concept, we need to
> make each skb really traverse in the original stack, not just
> switching skb->dev,
> that is cheating. It is just too easy to hide bugs in your way, we
> never use network
> namespace in this way before.

Unfortunately we don't have a way to switch to ns after L3 processing.
So I would
argue it the other-way around. The way it is now; breaks the _isolation_ model.
If the default-ns is responsible for whole L3 (in this situation) and
it does pretty well
on egress but there is no way to do that in ingress path. IPtables is
not the only thing,
how about routing, how about IPsec? None of this will function well.
So we need to
have a generic solution to solve all these problems.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH next 3/3] net: Use l3_dev instead of skb->dev for L3 processing
  2016-03-04 22:12             ` Mahesh Bandewar
@ 2016-03-08  5:37               ` Cong Wang
  2016-03-08  9:46                 ` Nicolas Dichtel
  2016-03-08 18:42                 ` Mahesh Bandewar
  0 siblings, 2 replies; 11+ messages in thread
From: Cong Wang @ 2016-03-08  5:37 UTC (permalink / raw)
  To: Mahesh Bandewar
  Cc: Mahesh Bandewar, David Miller, Eric Dumazet, netdev, Tim Hockin,
	Alex Pollitt, Matthew Dupre

On Fri, Mar 4, 2016 at 2:12 PM, Mahesh Bandewar <maheshb@google.com> wrote:
>
> Unfortunately we don't have a way to switch to ns after L3 processing.

I am totally aware of this, this is exactly why I said this might not be easy.

The question is how hard it is to implement one? My gut feeling is we only
need to change some data in skb, something similar to skb_scrub_packet().
But I never even try.

> So I would
> argue it the other-way around. The way it is now; breaks the _isolation_ model.
> If the default-ns is responsible for whole L3 (in this situation) and
> it does pretty well
> on egress but there is no way to do that in ingress path. IPtables is
> not the only thing,
> how about routing, how about IPsec? None of this will function well.
> So we need to
> have a generic solution to solve all these problems.

I don't understand why you question me this, it is you who only cares
about iptables from your cover letter for this patchset, not me.

The more subsystems involves, the more struct net pointers you
potentially need to touch, the less likely you can make it correct
by just switching skb->dev.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH next 3/3] net: Use l3_dev instead of skb->dev for L3 processing
  2016-03-08  5:37               ` Cong Wang
@ 2016-03-08  9:46                 ` Nicolas Dichtel
  2016-03-08 18:42                 ` Mahesh Bandewar
  1 sibling, 0 replies; 11+ messages in thread
From: Nicolas Dichtel @ 2016-03-08  9:46 UTC (permalink / raw)
  To: Cong Wang, Mahesh Bandewar
  Cc: Mahesh Bandewar, David Miller, Eric Dumazet, netdev, Tim Hockin,
	Alex Pollitt, Matthew Dupre, Eric W. Biederman

+ Eric B.

Le 08/03/2016 06:37, Cong Wang a écrit :
> On Fri, Mar 4, 2016 at 2:12 PM, Mahesh Bandewar <maheshb@google.com> wrote:
>>
>> Unfortunately we don't have a way to switch to ns after L3 processing.
>
> I am totally aware of this, this is exactly why I said this might not be easy.
>
> The question is how hard it is to implement one? My gut feeling is we only
> need to change some data in skb, something similar to skb_scrub_packet().
> But I never even try.

Note that Eric Biederman has made "some" patches to be able to "control" the
netns on the egress side. The goal is to be able to have routes that leak to
another netns.
It seems that the same work should probably be done at the ingress side.

I agree with Cong that just changing skb->dev is not the right approach.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH next 3/3] net: Use l3_dev instead of skb->dev for L3 processing
  2016-03-08  5:37               ` Cong Wang
  2016-03-08  9:46                 ` Nicolas Dichtel
@ 2016-03-08 18:42                 ` Mahesh Bandewar
  2016-03-10  0:09                   ` Cong Wang
  1 sibling, 1 reply; 11+ messages in thread
From: Mahesh Bandewar @ 2016-03-08 18:42 UTC (permalink / raw)
  To: Cong Wang
  Cc: Mahesh Bandewar, David Miller, Eric Dumazet, netdev, Tim Hockin,
	Alex Pollitt, Matthew Dupre

On Mon, Mar 7, 2016 at 9:37 PM, Cong Wang <xiyou.wangcong@gmail.com> wrote:
> On Fri, Mar 4, 2016 at 2:12 PM, Mahesh Bandewar <maheshb@google.com> wrote:
>>
>> Unfortunately we don't have a way to switch to ns after L3 processing.
>
> I am totally aware of this, this is exactly why I said this might not be easy.
>
> The question is how hard it is to implement one? My gut feeling is we only
> need to change some data in skb, something similar to skb_scrub_packet().
> But I never even try.
>
>> So I would
>> argue it the other-way around. The way it is now; breaks the _isolation_ model.
>> If the default-ns is responsible for whole L3 (in this situation) and
>> it does pretty well
>> on egress but there is no way to do that in ingress path. IPtables is
>> not the only thing,
>> how about routing, how about IPsec? None of this will function well.
>> So we need to
>> have a generic solution to solve all these problems.
>
> I don't understand why you question me this, it is you who only cares
> about iptables from your cover letter for this patchset, not me.
>
calm down! I'm not questioning you or anyone. There is problem and I would
prefer to solve it properly without cooking-up some short-term hack. The problem
reported is about IPtable and when I looked at it, I felt that it's
just a matter of
time until someone tries / uses IPsec or some routing. So I created
this solution.
I did mention about *complete* L3 processing in the cover letter but
agree that I
emphasized only on IPtables which I can correct it in the next version.

> The more subsystems involves, the more struct net pointers you
> potentially need to touch, the less likely you can make it correct
> by just switching skb->dev.

Please drop that prejudice and read the patch-set carefully. I'm
neither changing
any *net* pointers nor changing the skb->dev pointers anywhere. All I'm saying
is dev->l3_dev is what we'll use for *all* L3 processing and no need to change
skb->dev or net-ns of any device(s) involved.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH next 3/3] net: Use l3_dev instead of skb->dev for L3 processing
  2016-03-08 18:42                 ` Mahesh Bandewar
@ 2016-03-10  0:09                   ` Cong Wang
  0 siblings, 0 replies; 11+ messages in thread
From: Cong Wang @ 2016-03-10  0:09 UTC (permalink / raw)
  To: Mahesh Bandewar
  Cc: Mahesh Bandewar, David Miller, Eric Dumazet, netdev, Tim Hockin,
	Alex Pollitt, Matthew Dupre

On Tue, Mar 8, 2016 at 10:42 AM, Mahesh Bandewar <maheshb@google.com> wrote:
>> The more subsystems involves, the more struct net pointers you
>> potentially need to touch, the less likely you can make it correct
>> by just switching skb->dev.
>
> Please drop that prejudice and read the patch-set carefully. I'm
> neither changing
> any *net* pointers nor changing the skb->dev pointers anywhere. All I'm saying
> is dev->l3_dev is what we'll use for *all* L3 processing and no need to change
> skb->dev or net-ns of any device(s) involved.

Please don't misinterpret me.

I said _switching_, not overwriting or changing, you use dev->l3_dev to _switch_
skb->dev and/or net, this is exactly what I am complaining about. This is not
how netns works.

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2016-03-10  0:09 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-02-29 22:08 [PATCH next 3/3] net: Use l3_dev instead of skb->dev for L3 processing Mahesh Bandewar
2016-03-03  4:45 ` Cong Wang
2016-03-03 21:43   ` Mahesh Bandewar
     [not found]     ` <CAF2d9jiyjThgeQKw==tKRk3Sh0dNfzUdwJciwsVGPFLdb9uGTA@mail.gmail.com>
2016-03-04  0:44       ` Cong Wang
2016-03-04  1:42         ` Mahesh Bandewar
2016-03-04 17:30           ` Cong Wang
2016-03-04 22:12             ` Mahesh Bandewar
2016-03-08  5:37               ` Cong Wang
2016-03-08  9:46                 ` Nicolas Dichtel
2016-03-08 18:42                 ` Mahesh Bandewar
2016-03-10  0:09                   ` Cong Wang

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.