All of lore.kernel.org
 help / color / mirror / Atom feed
* vnet problem (bug? feature?)
@ 2015-02-13 11:41 Toerless Eckert
  2015-02-13 23:48 ` Cong Wang
  0 siblings, 1 reply; 11+ messages in thread
From: Toerless Eckert @ 2015-02-13 11:41 UTC (permalink / raw)
  To: netdev

Hope this is the right mailing list, if not, then which one ? (thanks)

- Created vnet pair
- NOT putting them into different namespaces.
- Unicast across them works fine.
- When sending IP multicsast into one end, i can not receive it on the other side
  (with normal socket API applications).

Interestingly:
- When using tcpdump (pcap), i can actually see the multicast packets
  both when binding to the sending vnet interface and when binding to
  the receiving vnet.
- When using tcpdump and sending unicast, tcpdump doesn't show me any packets ;-(

When putting one vnet interface into a separate namespace, i can send
multicast packets. I wonder if what i see is not a result of namespaces
but just how multicast packets interact with the routing table(s) (single
one when both vnet interfaces are in the same namespace).

Tihis is unfortunately all with an old kernel (3.7), i'll try to get a newer
kernel on some PC, a bit constrained on available HW, so thought i dare to
ask before having tested on the latest kernel.

(btw: Is there any api to put an interface back from a namespace into default ?)

Thanks!
    Toerless

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: vnet problem (bug? feature?)
  2015-02-13 11:41 vnet problem (bug? feature?) Toerless Eckert
@ 2015-02-13 23:48 ` Cong Wang
  2015-02-14 10:15   ` Toerless Eckert
  0 siblings, 1 reply; 11+ messages in thread
From: Cong Wang @ 2015-02-13 23:48 UTC (permalink / raw)
  To: Toerless Eckert; +Cc: netdev

On Fri, Feb 13, 2015 at 3:41 AM, Toerless Eckert <tte@cs.fau.de> wrote:
> Hope this is the right mailing list, if not, then which one ? (thanks)

Yes it is.

>
> - Created vnet pair
> - NOT putting them into different namespaces.
> - Unicast across them works fine.
> - When sending IP multicsast into one end, i can not receive it on the other side
>   (with normal socket API applications).
>

Hmm, what does your routing table look like?

They are in the same namespace, so in the same stack, so their IP addresses
belong to the same stack.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: vnet problem (bug? feature?)
  2015-02-13 23:48 ` Cong Wang
@ 2015-02-14 10:15   ` Toerless Eckert
  2015-02-14 18:17     ` Bill Fink
  0 siblings, 1 reply; 11+ messages in thread
From: Toerless Eckert @ 2015-02-14 10:15 UTC (permalink / raw)
  To: Cong Wang; +Cc: netdev

Thanks for replying, Cong.

On Fri, Feb 13, 2015 at 03:48:14PM -0800, Cong Wang wrote:
> > - Created vnet pair
> > - NOT putting them into different namespaces.
> > - Unicast across them works fine.
> > - When sending IP multicsast into one end, i can not receive it on the other side
> >   (with normal socket API applications).
> 
> Hmm, what does your routing table look like?
>
> They are in the same namespace, so in the same stack, so their IP addresses
> belong to the same stack.

Sure, but it must be possible to send/receive multicast packets to/from a specific
interface. For example link-local-scope multicast. Which works.

Just repeated with a mint 17, 3.13 kernel, same result:

ip link add name veth1 type veth peer name veth2
ip addr add 10.0.0.1/24 dev veth1
ip addr add 10.0.0.2/24 dev veth2
ip link set dev veth1 up
ip link set dev veth2 up

Receiver socket, eg: on veth2:
   socket(AF_INET, SOCK_DGRAM, IPPROTO_UDP)
   setsockopt(SO_REUSEADDR, 1)
   bind(0.0.0.0/<port>)
   setsockopt(IP_ADD_MEMBERSHIP, 224.0.0.33/10.0.0.2)

   check wih "netstat -gn" that there is IGMP membership on veth2:
   veth2           1      224.0.0.33

Sender socket, eg: on veth1:
   socket(AF_INET, SOCK_DGRAM, IPPROTO_UDP)
   setsockopt(SO_REUSEADDR, 1)
   bind(10.0.0.1/7000)
   connect(224.0.0.33/<port>)

Sending packet, check how they're transmitted:
   - TX countes on veth1 go up (ifconfig output)
   - RX counters on veth2 go up (ifconfig output)
   - tcpdump -i veth2 -P in shows packets being received
   - tcpdump -i veth1 -P out shows packets being sent

Played around with lots of parameters:
   - same behavior for non-link-local-scope multicast, TTL > 1 doesn't elp.
   - same behavior if setting "multicast, "allmulticast", "promiscuous" on the veth
   - same behavior when setting IP_MULTICAST_LOOP on sender.

Routing table:
netstat -r -n
Kernel IP routing table
Destination     Gateway         Genmask         Flags   MSS Window  irtt Iface
0.0.0.0         192.168.1.254   0.0.0.0         UG        0 0          0 eth1
10.0.0.0        0.0.0.0         255.255.255.0   U         0 0          0 veth1
10.0.0.0        0.0.0.0         255.255.255.0   U         0 0          0 veth2
192.168.1.0     0.0.0.0         255.255.255.0   U         0 0          0 eth1

And of course it works if one side is put into a separate namespace,
but that doesn't help me.

But: it really seems to be a problem with the kernel/sockets, not with veth.
Just replaced the veth pair with a pair of ethernets with a loopback cable and
pretty much exactly the same result (except that receiver side does not see
packets in RX unless it's promiscuous or has a real receiver socket, but that's
perfect). But not being a veth problem but other kernel network stack "feature"
doesn't make it right IMHO. I can't see by which "logic" the receiver socket
seemingly does not care about these packets even though it's explicitly bound
to the interface and the multicast group. "Gimme the darn packets, socket,
they are received on the interface"! ;-))

I can play around with the receiver side socket API call details, but i really
don't see why those should be different if the packets happen to be looped
than if they're not.

Cheers
    Toerless

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: vnet problem (bug? feature?)
  2015-02-14 10:15   ` Toerless Eckert
@ 2015-02-14 18:17     ` Bill Fink
  2015-02-15 10:12       ` Toerless Eckert
  2015-02-15 19:00       ` Toerless Eckert
  0 siblings, 2 replies; 11+ messages in thread
From: Bill Fink @ 2015-02-14 18:17 UTC (permalink / raw)
  To: Toerless Eckert; +Cc: Cong Wang, netdev

On Sat, 14 Feb 2015, Toerless Eckert wrote:

> Thanks for replying, Cong.
> 
> On Fri, Feb 13, 2015 at 03:48:14PM -0800, Cong Wang wrote:
> > > - Created vnet pair
> > > - NOT putting them into different namespaces.
> > > - Unicast across them works fine.
> > > - When sending IP multicsast into one end, i can not receive it on the other side
> > >   (with normal socket API applications).
> > 
> > Hmm, what does your routing table look like?
> >
> > They are in the same namespace, so in the same stack, so their IP addresses
> > belong to the same stack.
> 
> Sure, but it must be possible to send/receive multicast packets to/from a specific
> interface. For example link-local-scope multicast. Which works.
> 
> Just repeated with a mint 17, 3.13 kernel, same result:
> 
> ip link add name veth1 type veth peer name veth2
> ip addr add 10.0.0.1/24 dev veth1
> ip addr add 10.0.0.2/24 dev veth2
> ip link set dev veth1 up
> ip link set dev veth2 up

Did you try disabling reverse path filtering:

echo 0 > /proc/sys/net/ipv4/conf/veth1/rp_filter
echo 0 > /proc/sys/net/ipv4/conf/veth2/rp_filter

Both veth1 and veth2 are in the same subnet, but only one
(presumably veth1) is the expected source for packets coming
from net 10, so when the muticast packets from a net 10
source arrive on veth2, they are rejected for arriving
on the wrong interface.

You could check this with "nstat -z | grep -i filter".

The above is an educated guess on my part, and could
be something completely different.

					-Bill



> Receiver socket, eg: on veth2:
>    socket(AF_INET, SOCK_DGRAM, IPPROTO_UDP)
>    setsockopt(SO_REUSEADDR, 1)
>    bind(0.0.0.0/<port>)
>    setsockopt(IP_ADD_MEMBERSHIP, 224.0.0.33/10.0.0.2)
> 
>    check wih "netstat -gn" that there is IGMP membership on veth2:
>    veth2           1      224.0.0.33
> 
> Sender socket, eg: on veth1:
>    socket(AF_INET, SOCK_DGRAM, IPPROTO_UDP)
>    setsockopt(SO_REUSEADDR, 1)
>    bind(10.0.0.1/7000)
>    connect(224.0.0.33/<port>)
> 
> Sending packet, check how they're transmitted:
>    - TX countes on veth1 go up (ifconfig output)
>    - RX counters on veth2 go up (ifconfig output)
>    - tcpdump -i veth2 -P in shows packets being received
>    - tcpdump -i veth1 -P out shows packets being sent
> 
> Played around with lots of parameters:
>    - same behavior for non-link-local-scope multicast, TTL > 1 doesn't elp.
>    - same behavior if setting "multicast, "allmulticast", "promiscuous" on the veth
>    - same behavior when setting IP_MULTICAST_LOOP on sender.
> 
> Routing table:
> netstat -r -n
> Kernel IP routing table
> Destination     Gateway         Genmask         Flags   MSS Window  irtt Iface
> 0.0.0.0         192.168.1.254   0.0.0.0         UG        0 0          0 eth1
> 10.0.0.0        0.0.0.0         255.255.255.0   U         0 0          0 veth1
> 10.0.0.0        0.0.0.0         255.255.255.0   U         0 0          0 veth2
> 192.168.1.0     0.0.0.0         255.255.255.0   U         0 0          0 eth1
> 
> And of course it works if one side is put into a separate namespace,
> but that doesn't help me.
> 
> But: it really seems to be a problem with the kernel/sockets, not with veth.
> Just replaced the veth pair with a pair of ethernets with a loopback cable and
> pretty much exactly the same result (except that receiver side does not see
> packets in RX unless it's promiscuous or has a real receiver socket, but that's
> perfect). But not being a veth problem but other kernel network stack "feature"
> doesn't make it right IMHO. I can't see by which "logic" the receiver socket
> seemingly does not care about these packets even though it's explicitly bound
> to the interface and the multicast group. "Gimme the darn packets, socket,
> they are received on the interface"! ;-))
> 
> I can play around with the receiver side socket API call details, but i really
> don't see why those should be different if the packets happen to be looped
> than if they're not.
> 
> Cheers
>     Toerless

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: vnet problem (bug? feature?)
  2015-02-14 18:17     ` Bill Fink
@ 2015-02-15 10:12       ` Toerless Eckert
  2015-02-15 19:00       ` Toerless Eckert
  1 sibling, 0 replies; 11+ messages in thread
From: Toerless Eckert @ 2015-02-15 10:12 UTC (permalink / raw)
  To: Bill Fink; +Cc: Cong Wang, netdev

Thanks, Bill.

Unfortunately, that kernel var didn't do the trick. I had tried to reverse
the direction in before too, so if the namespace was doing RPF check and accepting
from only one interfce, than one direction should have worked.

Cheers
    Toerless

On Sat, Feb 14, 2015 at 01:17:44PM -0500, Bill Fink wrote:
> On Sat, 14 Feb 2015, Toerless Eckert wrote:
> 
> > Thanks for replying, Cong.
> > 
> > On Fri, Feb 13, 2015 at 03:48:14PM -0800, Cong Wang wrote:
> > > > - Created vnet pair
> > > > - NOT putting them into different namespaces.
> > > > - Unicast across them works fine.
> > > > - When sending IP multicsast into one end, i can not receive it on the other side
> > > >   (with normal socket API applications).
> > > 
> > > Hmm, what does your routing table look like?
> > >
> > > They are in the same namespace, so in the same stack, so their IP addresses
> > > belong to the same stack.
> > 
> > Sure, but it must be possible to send/receive multicast packets to/from a specific
> > interface. For example link-local-scope multicast. Which works.
> > 
> > Just repeated with a mint 17, 3.13 kernel, same result:
> > 
> > ip link add name veth1 type veth peer name veth2
> > ip addr add 10.0.0.1/24 dev veth1
> > ip addr add 10.0.0.2/24 dev veth2
> > ip link set dev veth1 up
> > ip link set dev veth2 up
> 
> Did you try disabling reverse path filtering:
> 
> echo 0 > /proc/sys/net/ipv4/conf/veth1/rp_filter
> echo 0 > /proc/sys/net/ipv4/conf/veth2/rp_filter
> 
> Both veth1 and veth2 are in the same subnet, but only one
> (presumably veth1) is the expected source for packets coming
> from net 10, so when the muticast packets from a net 10
> source arrive on veth2, they are rejected for arriving
> on the wrong interface.
> 
> You could check this with "nstat -z | grep -i filter".
> 
> The above is an educated guess on my part, and could
> be something completely different.
> 
> 					-Bill
> 
> 
> 
> > Receiver socket, eg: on veth2:
> >    socket(AF_INET, SOCK_DGRAM, IPPROTO_UDP)
> >    setsockopt(SO_REUSEADDR, 1)
> >    bind(0.0.0.0/<port>)
> >    setsockopt(IP_ADD_MEMBERSHIP, 224.0.0.33/10.0.0.2)
> > 
> >    check wih "netstat -gn" that there is IGMP membership on veth2:
> >    veth2           1      224.0.0.33
> > 
> > Sender socket, eg: on veth1:
> >    socket(AF_INET, SOCK_DGRAM, IPPROTO_UDP)
> >    setsockopt(SO_REUSEADDR, 1)
> >    bind(10.0.0.1/7000)
> >    connect(224.0.0.33/<port>)
> > 
> > Sending packet, check how they're transmitted:
> >    - TX countes on veth1 go up (ifconfig output)
> >    - RX counters on veth2 go up (ifconfig output)
> >    - tcpdump -i veth2 -P in shows packets being received
> >    - tcpdump -i veth1 -P out shows packets being sent
> > 
> > Played around with lots of parameters:
> >    - same behavior for non-link-local-scope multicast, TTL > 1 doesn't elp.
> >    - same behavior if setting "multicast, "allmulticast", "promiscuous" on the veth
> >    - same behavior when setting IP_MULTICAST_LOOP on sender.
> > 
> > Routing table:
> > netstat -r -n
> > Kernel IP routing table
> > Destination     Gateway         Genmask         Flags   MSS Window  irtt Iface
> > 0.0.0.0         192.168.1.254   0.0.0.0         UG        0 0          0 eth1
> > 10.0.0.0        0.0.0.0         255.255.255.0   U         0 0          0 veth1
> > 10.0.0.0        0.0.0.0         255.255.255.0   U         0 0          0 veth2
> > 192.168.1.0     0.0.0.0         255.255.255.0   U         0 0          0 eth1
> > 
> > And of course it works if one side is put into a separate namespace,
> > but that doesn't help me.
> > 
> > But: it really seems to be a problem with the kernel/sockets, not with veth.
> > Just replaced the veth pair with a pair of ethernets with a loopback cable and
> > pretty much exactly the same result (except that receiver side does not see
> > packets in RX unless it's promiscuous or has a real receiver socket, but that's
> > perfect). But not being a veth problem but other kernel network stack "feature"
> > doesn't make it right IMHO. I can't see by which "logic" the receiver socket
> > seemingly does not care about these packets even though it's explicitly bound
> > to the interface and the multicast group. "Gimme the darn packets, socket,
> > they are received on the interface"! ;-))
> > 
> > I can play around with the receiver side socket API call details, but i really
> > don't see why those should be different if the packets happen to be looped
> > than if they're not.
> > 
> > Cheers
> >     Toerless
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

-- 
---
Toerless.Eckert@informatik.uni-erlangen.de
/C=de/A=d400/P=uni-erlangen/OU=informatik/S=Eckert/G=Toerless/

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: vnet problem (bug? feature?)
  2015-02-14 18:17     ` Bill Fink
  2015-02-15 10:12       ` Toerless Eckert
@ 2015-02-15 19:00       ` Toerless Eckert
  2015-02-15 21:16         ` Sowmini Varadhan
  2015-02-16 21:51         ` Bill Fink
  1 sibling, 2 replies; 11+ messages in thread
From: Toerless Eckert @ 2015-02-15 19:00 UTC (permalink / raw)
  To: Bill Fink; +Cc: Cong Wang, netdev

*Bingo* rp_filter did the trick.

nstat is fairly useless to figrue this out, no RPF counters.

Quite strange to see rp_filter. Especilly for multicast. But i haven't
followed linux for many years in this level of detail. I thought
Linux was always weak host model. But even for strong host model,
i can't remember that RPF checking was done in the past (for hosts,
not routers obviously).

Cheers
     Toerless

On Sat, Feb 14, 2015 at 01:17:44PM -0500, Bill Fink wrote:
> > ip link add name veth1 type veth peer name veth2
> > ip addr add 10.0.0.1/24 dev veth1
> > ip addr add 10.0.0.2/24 dev veth2
> > ip link set dev veth1 up
> > ip link set dev veth2 up
> 
> Did you try disabling reverse path filtering:
> 
> echo 0 > /proc/sys/net/ipv4/conf/veth1/rp_filter
> echo 0 > /proc/sys/net/ipv4/conf/veth2/rp_filter
> 
> Both veth1 and veth2 are in the same subnet, but only one
> (presumably veth1) is the expected source for packets coming
> from net 10, so when the muticast packets from a net 10
> source arrive on veth2, they are rejected for arriving
> on the wrong interface.
> 
> You could check this with "nstat -z | grep -i filter".
> 
> The above is an educated guess on my part, and could
> be something completely different.
> 
> 					-Bill
> 
> 
> 
> > Receiver socket, eg: on veth2:
> >    socket(AF_INET, SOCK_DGRAM, IPPROTO_UDP)
> >    setsockopt(SO_REUSEADDR, 1)
> >    bind(0.0.0.0/<port>)
> >    setsockopt(IP_ADD_MEMBERSHIP, 224.0.0.33/10.0.0.2)
> > 
> >    check wih "netstat -gn" that there is IGMP membership on veth2:
> >    veth2           1      224.0.0.33
> > 
> > Sender socket, eg: on veth1:
> >    socket(AF_INET, SOCK_DGRAM, IPPROTO_UDP)
> >    setsockopt(SO_REUSEADDR, 1)
> >    bind(10.0.0.1/7000)
> >    connect(224.0.0.33/<port>)
> > 
> > Sending packet, check how they're transmitted:
> >    - TX countes on veth1 go up (ifconfig output)
> >    - RX counters on veth2 go up (ifconfig output)
> >    - tcpdump -i veth2 -P in shows packets being received
> >    - tcpdump -i veth1 -P out shows packets being sent
> > 
> > Played around with lots of parameters:
> >    - same behavior for non-link-local-scope multicast, TTL > 1 doesn't elp.
> >    - same behavior if setting "multicast, "allmulticast", "promiscuous" on the veth
> >    - same behavior when setting IP_MULTICAST_LOOP on sender.
> > 
> > Routing table:
> > netstat -r -n
> > Kernel IP routing table
> > Destination     Gateway         Genmask         Flags   MSS Window  irtt Iface
> > 0.0.0.0         192.168.1.254   0.0.0.0         UG        0 0          0 eth1
> > 10.0.0.0        0.0.0.0         255.255.255.0   U         0 0          0 veth1
> > 10.0.0.0        0.0.0.0         255.255.255.0   U         0 0          0 veth2
> > 192.168.1.0     0.0.0.0         255.255.255.0   U         0 0          0 eth1
> > 
> > And of course it works if one side is put into a separate namespace,
> > but that doesn't help me.
> > 
> > But: it really seems to be a problem with the kernel/sockets, not with veth.
> > Just replaced the veth pair with a pair of ethernets with a loopback cable and
> > pretty much exactly the same result (except that receiver side does not see
> > packets in RX unless it's promiscuous or has a real receiver socket, but that's
> > perfect). But not being a veth problem but other kernel network stack "feature"
> > doesn't make it right IMHO. I can't see by which "logic" the receiver socket
> > seemingly does not care about these packets even though it's explicitly bound
> > to the interface and the multicast group. "Gimme the darn packets, socket,
> > they are received on the interface"! ;-))
> > 
> > I can play around with the receiver side socket API call details, but i really
> > don't see why those should be different if the packets happen to be looped
> > than if they're not.
> > 
> > Cheers
> >     Toerless

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: vnet problem (bug? feature?)
  2015-02-15 19:00       ` Toerless Eckert
@ 2015-02-15 21:16         ` Sowmini Varadhan
  2015-02-16 10:13           ` Toerless Eckert
  2015-02-16 21:51         ` Bill Fink
  1 sibling, 1 reply; 11+ messages in thread
From: Sowmini Varadhan @ 2015-02-15 21:16 UTC (permalink / raw)
  To: Toerless Eckert; +Cc: Bill Fink, Cong Wang, netdev

On Sun, Feb 15, 2015 at 2:00 PM, Toerless Eckert <tte@cs.fau.de> wrote:
> *Bingo* rp_filter did the trick.
>
> nstat is fairly useless to figrue this out, no RPF counters.
>
> Quite strange to see rp_filter. Especilly for multicast. But i haven't
> followed linux for many years in this level of detail. I thought
> Linux was always weak host model. But even for strong host model,
> i can't remember that RPF checking was done in the past (for hosts,
> not routers obviously).

RPF !=  strong/weak ES models defined in Section 3.3.4.2 of rfc1122.

RPF is about ingress filtering (rfc 3704) and verifying that the return
path to the src addr of the packet would go out on the same interface
it came on. The wiki page on Reverse_path_forwarding has some detail.
--Sowmini

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: vnet problem (bug? feature?)
  2015-02-15 21:16         ` Sowmini Varadhan
@ 2015-02-16 10:13           ` Toerless Eckert
  2015-02-16 15:30             ` David Ahern
  2015-02-16 19:54             ` David Miller
  0 siblings, 2 replies; 11+ messages in thread
From: Toerless Eckert @ 2015-02-16 10:13 UTC (permalink / raw)
  To: Sowmini Varadhan; +Cc: Bill Fink, Cong Wang, netdev

On Sun, Feb 15, 2015 at 04:16:21PM -0500, Sowmini Varadhan wrote:
> RPF !=  strong/weak ES models defined in Section 3.3.4.2 of rfc1122.

Agreed on the RFC definition, but not on the model. RP filtering makes
it more difficult, if not impossible to a weak-host. Consider the multi-homed
host that's attached such that it would receive packets for one of its addresses
from different interfaces. RP filtering throws away those packet from all but
one interface (just talking unicast hee for the sake of the argument).

> RPF is about ingress filtering (rfc 3704) and verifying that the return
> path to the src addr of the packet would go out on the same interface
> it came on. The wiki page on Reverse_path_forwarding has some detail.

rfc3704 does mention multicast only on the side, so i would claim Fred did
primarily think about unicast, and the whole text is also targeted for ISPs
== routers, not for RPF filtering on actual multi homed hosts.

Of course, RPF filtering for multicast has been traditionally used in
almost all relevant routing protocols, but again: thats only on routers,
and AFAIK in the distant past not on MHH.

I fail to find a good reference explaining why linux would default to
rp_filtering = 1 (more appropriate for routers) even if forwarding defaults to 0
(more appropriate for multi-homed hosts). 

Any ideas how to track back where this  choice came from ? 

Thanks!
    Toerless

> --Sowmini
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

-- 
---
Toerless.Eckert@informatik.uni-erlangen.de
/C=de/A=d400/P=uni-erlangen/OU=informatik/S=Eckert/G=Toerless/

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: vnet problem (bug? feature?)
  2015-02-16 10:13           ` Toerless Eckert
@ 2015-02-16 15:30             ` David Ahern
  2015-02-16 19:54             ` David Miller
  1 sibling, 0 replies; 11+ messages in thread
From: David Ahern @ 2015-02-16 15:30 UTC (permalink / raw)
  To: Toerless Eckert, Sowmini Varadhan; +Cc: Bill Fink, Cong Wang, netdev

On 2/16/15 3:13 AM, Toerless Eckert wrote:
> I fail to find a good reference explaining why linux would default to
> rp_filtering = 1 (more appropriate for routers) even if forwarding defaults to 0
> (more appropriate for multi-homed hosts).

It's a userspace default. For Fedora/Red Hat based systems see 
/etc/sysctl.conf (older releases) and /usr/lib/sysctl.d/50-default.conf 
(newer ones).

I recall it defaulting to 1 in the early 2000's so it has been that way 
for a long time.

 From Documentation/networking/ip-sysctl.txt:

rp_filter - INTEGER
...
         Default value is 0. Note that some distributions enable it
         in startup scripts.

David

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: vnet problem (bug? feature?)
  2015-02-16 10:13           ` Toerless Eckert
  2015-02-16 15:30             ` David Ahern
@ 2015-02-16 19:54             ` David Miller
  1 sibling, 0 replies; 11+ messages in thread
From: David Miller @ 2015-02-16 19:54 UTC (permalink / raw)
  To: tte; +Cc: sowmini05, billfink, cwang, netdev

From: Toerless Eckert <tte@cs.fau.de>
Date: Mon, 16 Feb 2015 11:13:10 +0100

> I fail to find a good reference explaining why linux would default to
> rp_filtering = 1 (more appropriate for routers) even if forwarding defaults to 0
> (more appropriate for multi-homed hosts). 
> 
> Any ideas how to track back where this  choice came from ? 

"Linux", ie. the kernel, does not default to '1' for rp_filtering.

The distributions are setting it to a non-zere value via
/etc/sysctl.conf or similar, and I've always said that I consider it
an extremely poor decision, as reverse path filtering is completely
pointless on an end host.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: vnet problem (bug? feature?)
  2015-02-15 19:00       ` Toerless Eckert
  2015-02-15 21:16         ` Sowmini Varadhan
@ 2015-02-16 21:51         ` Bill Fink
  1 sibling, 0 replies; 11+ messages in thread
From: Bill Fink @ 2015-02-16 21:51 UTC (permalink / raw)
  To: Toerless Eckert; +Cc: Cong Wang, netdev

On Sun, 15 Feb 2015, Toerless Eckert wrote:

> *Bingo* rp_filter did the trick.
> 
> nstat is fairly useless to figrue this out, no RPF counters.

In theory, I believe it should show up:

wizard% nstat -az | grep IPReversePathFilter
TcpExtIPReversePathFilter       0                  0.0

Strange it shows up under TcpExt rather than IpExt.

The Linux MIB counter is LINUX_MIB_IPRPFILTER, set in
ip_rcv_finish().

The initial commit by Eric Dumazet introducing LINUX_MIB_IPRPFILTER
indicated it was only tested for unicast, so perhaps there could
be an issue with multicast reception in some cases, but if not
you should see that MIB counter increasing when you run your tests.

					-Bill



> Quite strange to see rp_filter. Especilly for multicast. But i haven't
> followed linux for many years in this level of detail. I thought
> Linux was always weak host model. But even for strong host model,
> i can't remember that RPF checking was done in the past (for hosts,
> not routers obviously).
> 
> Cheers
>      Toerless
> 
> On Sat, Feb 14, 2015 at 01:17:44PM -0500, Bill Fink wrote:
> > > ip link add name veth1 type veth peer name veth2
> > > ip addr add 10.0.0.1/24 dev veth1
> > > ip addr add 10.0.0.2/24 dev veth2
> > > ip link set dev veth1 up
> > > ip link set dev veth2 up
> > 
> > Did you try disabling reverse path filtering:
> > 
> > echo 0 > /proc/sys/net/ipv4/conf/veth1/rp_filter
> > echo 0 > /proc/sys/net/ipv4/conf/veth2/rp_filter
> > 
> > Both veth1 and veth2 are in the same subnet, but only one
> > (presumably veth1) is the expected source for packets coming
> > from net 10, so when the muticast packets from a net 10
> > source arrive on veth2, they are rejected for arriving
> > on the wrong interface.
> > 
> > You could check this with "nstat -z | grep -i filter".
> > 
> > The above is an educated guess on my part, and could
> > be something completely different.
> > 
> > 					-Bill
> > 
> > 
> > 
> > > Receiver socket, eg: on veth2:
> > >    socket(AF_INET, SOCK_DGRAM, IPPROTO_UDP)
> > >    setsockopt(SO_REUSEADDR, 1)
> > >    bind(0.0.0.0/<port>)
> > >    setsockopt(IP_ADD_MEMBERSHIP, 224.0.0.33/10.0.0.2)
> > > 
> > >    check wih "netstat -gn" that there is IGMP membership on veth2:
> > >    veth2           1      224.0.0.33
> > > 
> > > Sender socket, eg: on veth1:
> > >    socket(AF_INET, SOCK_DGRAM, IPPROTO_UDP)
> > >    setsockopt(SO_REUSEADDR, 1)
> > >    bind(10.0.0.1/7000)
> > >    connect(224.0.0.33/<port>)
> > > 
> > > Sending packet, check how they're transmitted:
> > >    - TX countes on veth1 go up (ifconfig output)
> > >    - RX counters on veth2 go up (ifconfig output)
> > >    - tcpdump -i veth2 -P in shows packets being received
> > >    - tcpdump -i veth1 -P out shows packets being sent
> > > 
> > > Played around with lots of parameters:
> > >    - same behavior for non-link-local-scope multicast, TTL > 1 doesn't elp.
> > >    - same behavior if setting "multicast, "allmulticast", "promiscuous" on the veth
> > >    - same behavior when setting IP_MULTICAST_LOOP on sender.
> > > 
> > > Routing table:
> > > netstat -r -n
> > > Kernel IP routing table
> > > Destination     Gateway         Genmask         Flags   MSS Window  irtt Iface
> > > 0.0.0.0         192.168.1.254   0.0.0.0         UG        0 0          0 eth1
> > > 10.0.0.0        0.0.0.0         255.255.255.0   U         0 0          0 veth1
> > > 10.0.0.0        0.0.0.0         255.255.255.0   U         0 0          0 veth2
> > > 192.168.1.0     0.0.0.0         255.255.255.0   U         0 0          0 eth1
> > > 
> > > And of course it works if one side is put into a separate namespace,
> > > but that doesn't help me.
> > > 
> > > But: it really seems to be a problem with the kernel/sockets, not with veth.
> > > Just replaced the veth pair with a pair of ethernets with a loopback cable and
> > > pretty much exactly the same result (except that receiver side does not see
> > > packets in RX unless it's promiscuous or has a real receiver socket, but that's
> > > perfect). But not being a veth problem but other kernel network stack "feature"
> > > doesn't make it right IMHO. I can't see by which "logic" the receiver socket
> > > seemingly does not care about these packets even though it's explicitly bound
> > > to the interface and the multicast group. "Gimme the darn packets, socket,
> > > they are received on the interface"! ;-))
> > > 
> > > I can play around with the receiver side socket API call details, but i really
> > > don't see why those should be different if the packets happen to be looped
> > > than if they're not.
> > > 
> > > Cheers
> > >     Toerless

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2015-02-16 21:51 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-02-13 11:41 vnet problem (bug? feature?) Toerless Eckert
2015-02-13 23:48 ` Cong Wang
2015-02-14 10:15   ` Toerless Eckert
2015-02-14 18:17     ` Bill Fink
2015-02-15 10:12       ` Toerless Eckert
2015-02-15 19:00       ` Toerless Eckert
2015-02-15 21:16         ` Sowmini Varadhan
2015-02-16 10:13           ` Toerless Eckert
2015-02-16 15:30             ` David Ahern
2015-02-16 19:54             ` David Miller
2015-02-16 21:51         ` Bill Fink

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.